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Abstract 

Data-driven methods play an increasingly important role in discovering geometric, structural, and semantic re¬ 
lationships between 3D shapes in collections, and applying this analysis to support intelligent modeling, editing, 
and visualization of geometric data. In contrast to traditional approaches, a key feature of data-driven approaches 
is that they aggregate information from a collection of shapes to improve the analysis and processing of individ¬ 
ual shapes. In addition, they are able to learn models that reason about properties and relationships of shapes 
without relying on hard-coded rules or explicitly programmed instructions. We provide an overview of the main 
concepts and components of these techniques, and discuss their application to shape classification, segmentation, 
matching, reconstruction, modeling and exploration, as well as scene analysis and synthesis, through reviewing 
the literature and relating the existing works with both qualitative and numerical comparisons. We conclude our 
report with ideas that can inspire future research in data-driven shape analysis and processing. 

Categories and Subject Descriptors (according to ACM CCS): L3.3 [Computer Graphics]: Picture/Image 
Generation—Line and curve generation 


1. Introduction 

As big geometric data is becoming more available (e.g., 
from fast and commodity 3D sensing and crowdsourcing 
shape modeling), the interest in processing of 3D shapes 
and scenes has been shifting towards data-driven techniques. 
These techniques leverage data to facilitate high-level shape 
understanding, and use this analysis to build effective tools 
for modeling, editing, and visualizing geometric data. In 
general, these methods start by discovering patterns in ge¬ 
ometry and structure of shapes, and then relate them to high- 
level concepts, semantics, function, and models that explain 
those patterns. The learned patterns serve as strong priors 
in various geometry processing applications. In contrast to 
traditional approaches, data-driven methods analyze a set of 
shapes jointly to extract and model meaningful mappings 
and correlations in the data, and learn priors directly from 
the data instead of relying on hard-coded rules or explicitly 
programmed instructions. 

The idea of utilizing data to support geometry process¬ 
ing has been exploited and practiced for many years. How¬ 
ever, most existing works based on this idea are confined to 
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example-based paradigm, thus mostly leveraging only one 
core concept of data-driven techniques - information trans¬ 
fer. Typically, the input to these problems includes one or 
multiple exemplar shapes with prescribed or precomputed 
information of interest, and a target shape that needs to be 
analyzed or processed. These techniques usually establish 
a correlation between the source and the target shapes and 
transfer the interesting information from the source to the 
target. The applications of such approach include a variety 
of methods in shape analysis (e.g. [SY07]) and shape syn¬ 
thesis (e.g. [Mer07,MHS*14]). 

As the number of available 3D shapes becomes signifi¬ 
cantly large, geometry processing techniques supported by 
these data go through a fundamental change. Several new 
concepts emerge in addition to information transfer, open¬ 
ing space for developing new techniques for shape analysis 
and content creation. In particular, the rich variability of 3D 
content in existing shape repositories makes it possible to 
directly reuse the shapes or parts for constructing new 3D 
models [FKS*04]. Content reuse for 3D modeling is per¬ 
haps the most straightforward application of big 3D geomet¬ 
ric data, providing a promising approach to address the chal¬ 
lenging 3D content creation problem. In addition, high-level 
understanding of shapes can benefit from co-analyzing col- 
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Figure 1: Data-driven shape processing and modeling pro¬ 
vides a promising solution to the development of “big 3D 
data”. Two major ways of 3D data generation, 3D sens¬ 
ing and 3D content creation, populate 3D databases with 
fast growing amount of 3D models. The database models 
are sparsely enhanced with manual segmentation and la¬ 
beling, as well as reasonably organized, to support data- 
driven shape analysis and processing, based on, e.g., ma¬ 
chine learning techniques. The learned knowledge can in 
turn support efficient 3D reconstruction and 3D content cre¬ 
ation, during which the knowledge can he transferred to the 
newly generated data. Such 3D data with semantic informa¬ 
tion can be included into the database to enrich it and facil¬ 
itate further data-driven applications. 


lections of shapes. Several analysis tools demonstrate that 
shape analysis is more reliable if it is supported by observ¬ 
ing certain attributes in a set of semantically related shapes 
instead of a single object. Co-analysis requires a critical 
step of finding the correlation between multiple shapes in 
the input set, which is substantially different from build¬ 
ing pair-wise correlation. A key concept to co-analysis is 
consistency of the correlations in the entire set, which has 
both semantic [KHSIO, SvKK*ll, WAvK*12] and mathe¬ 
matical [HG13] justifications. 

Relation to knowledge-driven shape processing. Prior to 
the emergence of data-driven techniques, high-level shape 
understanding and modeling was usually achieved with 
knowledge-driven methods. In knowledge-driven paradigm, 
geometric and structural patterns are extracted and inter¬ 
preted with the help of explicit rules or hand-crafted parame¬ 
ters. Such examples include heuristics-based shape segmen¬ 
tation [ShaOS] and procedural shape modeling [MWH*06]. 
Although these approaches find certain empirical success, 
they exhibit several inherent limitations. First, it is extremely 
hard to hard-code explicit rules and heuristics that can han¬ 
dle the enormous geometric and structural variability of 3D 
shapes and scenes in general. As a result, knowledge-driven 
techniques are unlikely to generalize successfully to large 
and diverse shape collections. Another issue is that it is usu¬ 
ally hard for non-expert users to interact with knowledge- 
driven techniques that require as input “low-level” geometric 
parameters or instructions. 

In contrast to knowledge drive methods, data-driven tech¬ 


niques learn representation and parameters from data. Their 
usually do not depend on hard-coded prior knowledge, and 
consequently do not rely on hand-crafted parameters, mak¬ 
ing these techniques more data-adaptive and thus lead to 
significantly improved performance in many practical set¬ 
tings. The success of data-driven approaches, backed by ma¬ 
chine learning techniques, heavily relies on the accessibility 
of large data collections. We have witnessed the success of 
increasing the training set by orders of magnitude to signifi¬ 
cantly improve the performance of common machine learn¬ 
ing algorithms [BBOl]. Thus, the recent developments in 3D 
modeling tools and acquisition techniques for 3D geometry, 
as well as availability of large repositories of 3D shapes (e.g., 
Trimble 3D Warehouse, Yobi3D , etc.), offer great opportu¬ 
nities for developing data-driven approaches for 3D shape 
analysis and processing. 

Relation to structure-aware shape processing. This 
report is closely related to the recent survey on 
“structure-aware shape processing” by Mitra and co¬ 
workers [MWZ*14], which concentrates on techniques for 
structural analysis of 3D shapes, as well as high-level shape 
processing guided by structure-preservation. In that survey, 
shape structure is defined as the arrangement and relations 
between shape parts, which is analyzed through identifying 
shape parts, part parameters, and part relations. Each of 
the three can be determined through manual assignment, 
predefined model fitting and data-driven learning. 

In contrast, our report takes from a very different 
perspective—how the availability of big geometric data has 
changed the field of shape analysis and processing. In par¬ 
ticular, we want to highlight several key distinctions: First, 
data-driven shape processing goes beyond structure anal¬ 
ysis. For example, leveraging large shape collections may 
benefit a wider variety of problems in shape understand¬ 
ing and processing, such as parametric modeling of shape 
space [ACP03], hypothesis generation for object and scene 
understanding [ZSSS13, SLH12], and information trans¬ 
fer between multi-modal data [WGW*13, SHM*14]. Data- 
driven shape processing may also exploit the data-centered 
techniques in machine learning such as sparse represen¬ 
tation [RR13] and feature learning [LBF13], which are 
not pre-conditioned on any domain-specific or structural 
prior beyond raw data. Second, even within the realm of 
structure-aware shape processing, data-driven approaches 
are arguably becoming the dominant branch due to their the¬ 
oretical and practical advantages, availability of large shape 
repositories, and recent developments in machine learning. 

Vision and motivation. With the emergence of “big data”, 
many scientific disciplines have shifted their focus to data- 
driven techniques. Although 3D geometry data is still far 
from being as ubiquitous as some other data formats (e.g., 
photographs), rapidly growing number of 3D models, recent 
developments in fusing 2D and 3D data, and invention of 
commodity depth cameras, have made the era of “big 3D 
data” more promising than ever. At the same time, we expect 
data-driven approaches to take one of the leading roles in un¬ 
derstanding and reconstruction of acquired 3D data, as well 
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Figure 2: The general pipeline of data-driven geometry processing contains four major stages: data collection and prepro¬ 
cessing, feature extraction (or feature learning), learning and inference. The inference supports many applications which would 
produce new shapes or scenes through reconstruction modeling or synthesis. These new data, typically possessing labels for 
shapes or parts, can be used to enrich the input datasets and enhance the learning tasks in future, forming a data-driven 
geometry processing loop. 


as synthesis of new shapes. In summary, data-driven geome¬ 
try proeessing will close the loop from acquisition, analysis, 
and processing to generation of 3D shapes (see Figure 1), 
and will be a key tool for manipulating big visual data. 

Recent years have witnessed a rapid development of data- 
driven geometry processing algorithms, both in computer 
graphics and in computer vision communities. Given the re¬ 
search efforts and wide interests in the subject, we believe 
many researchers would benefit from a comprehensive and 
systematic survey. We also wish such a survey can simulate 
new theories, problems, and applications. 

Organization. This survey is organized as follows. Sec¬ 
tion 2 gives a high-level overview of data-driven approaches 
and classifies data-driven methods with respect to their ap¬ 
plication domains. This section also provides two repre¬ 
sentative examples for the reader to understand the gen¬ 
eral work-flow of data-driven geometry processing. The fol¬ 
lowing sections survey various data-driven shape processing 
problems in detail. Finally, we conclude by listing a set of 
key challenges and providing a vision on future directions. 

Accompanying online resources. In order to assist the 
readers in learning and leveraging the basic algorithms, we 
provide an online wikipage [XKHK14], which collects tools, 
source codes, together with benchmark data for typical prob¬ 
lems and applications of data-driven shape processing. This 
page will also provide links and data mining tools for obtain¬ 
ing large data collections of shapes and scenes. The website 
would serve as a starting point for those who are conducting 
research in this direction, we also expect it to benefit a wide 
spectrum of researchers from related fields. 

2. Overview 

In this section, we provide a high-level overview of the main 
components and steps of data-driven approaches for process¬ 
ing 3D shapes and scenes. Although the pipeline of these 
methods significantly vary depending on their particular ap¬ 
plications and goals, a number of components tend to be 
common: the input data collection and processing, data rep¬ 
resentations and feature extraction, learning and inference. 


Representation, learning and inference are critical compo¬ 
nents of machine learning approaches in general [KF09]. 
In the case of shape and scene processing, each of these 
components poses several interesting and unique problems 
when dealing with 3D geometric data. These problems have 
greatly motivated the research on data-driven geometry pro¬ 
cessing, and in turn brought new challenges to computer vi¬ 
sion and machine learning communities, as reflected by the 
increasing interest in 3D visual data from these fields. Be¬ 
low, we discuss particular characteristics and challenges of 
data-driven 3D shape and scene processing algorithms. Fig¬ 
ure 2 provides a schematic overview of the most common 
components of these algorithms. 

2.1. 3D data collection 

Shape representation. A main component of data-driven 
approaches for shape and scene processing is data collection, 
where the goal is acquire a number of 3D shapes and scenes 
depending on the application. When shapes and scenes are 
captured with scanners or depth sensors, their initial repre¬ 
sentation is in the form of range data or unorganized point 
clouds. Several data-driven methods for reconstruction, seg¬ 
mentation and recognition directly work on these representa¬ 
tions and do not require any further processing. On the other 
hand, online repositories, such as the Trimble 3D Ware¬ 
house, contain millions of shapes and scenes that are rep¬ 
resented as polygon meshes. A large number of data-driven 
techniques are designed to handle complete shapes in the 
form of polygon meshes created by 3D modeling tools or 
re-constructed from point clouds. Choosing which represen¬ 
tation to use depends on the application. For example, data- 
driven reconstruction techniques aim for generating com¬ 
plete shapes and scenes from noisy point clouds with miss¬ 
ing data. The reconstructed shapes can then be processed 
with other data-driven methods for categorization, segmen¬ 
tation, matching and so on. Developing methods that can 
handle any 3D data representation, as well as Jointly recon¬ 
struct and analyze shapes is a potential direction for future 
research we discuss in Section 10. 

When polygon meshes are used as the input representa¬ 
tion, an important aspect to consider is whether and how 
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Figure 3: Pipeline of a supervised segmentation algorithm [KHSIO]. Given a set of shapes with labeled parts, the points of 
each shape are embedded in a common feature space based on their local geometric descriptors (a color is assigned to points 
depending on their given part label). A classifier is learned to split the feature space into regions corresponding to each part 
label Given a test shape, its points (shown in grey) are first embedded in the same space. Then part labels are inferred for all 
its points based on the learned classifier and an underlying structured probabilistic model (Section 4). 


data-driven methods will deal with possible “defects”, such 
as non-manifold and non-orientable sets of polygons, in¬ 
verted faces, isolated elements, self-inter sections, holes and 
topological noise. The vast majority of meshes available in 
online repositories have these problems. Although there is 
a number of mesh repairing tools (see [CAK12] for a sur¬ 
vey), they may not handle all different types of “defects”, 
and can take a significant amount of time to process each 
shape in large datasets. To avoid the issues caused by these 
“defects”, some data-driven methods uniformly sample the 
input meshes and work on the resulting point-based repre¬ 
sentation instead (e.g., [CKGK11,KLM* 13]). 

Datasets. Although it is desirable to develop data-driven 
methods that can learn from a handful of training shapes or 
scenes, this is generally a challenging problem in machine 
learning [FFFP06]. Several data-driven methods in computer 
vision have been particularly successful due to the use of 
very large datasets that can reach the size of several millions 
of images [TFF08]. In contrast, data-driven approaches for 
3D shape and scene processing approaches have mostly re¬ 
lied on datasets that reach the order of a few thousands so 
far (e.g., Princeton Shape Benchmark [SMKF04], or datasets 
collected from the web [KLM* 13]). Online repositories con¬ 
tain large amount of shapes, which can lead to the develop¬ 
ment of methods that will leverage datasets that are orders 
of magnitudes larger than the ones currently used. Another 
possibility is to develop synthetic datasets. A notable exam¬ 
ple is the pose and part recognition algorithm used in Mi¬ 
crosoft’s Kinect that relies on 500K synthesized shapes of 
human bodies in different poses [SFC*11]. In general, large 
datasets are important to capture the enormous 3D shape and 
scene variability, and can significantly increase the predic¬ 
tive performance and usability of learning methods. A more 
comprehensive summary of the existing online data collec¬ 
tions can be found on our wikipage [XKHK14]. 

2.2. 3D data processing and feature representation 

It is common to perform some additional processing on the 
input representations of shapes and scenes before executing 
the main learning step. The reason is that the input repre¬ 


sentations of 3D shapes and scenes can have different reso¬ 
lutions (e.g., number of points or faces), scale, orientation, 
and structure. In other words, the input shapes and scenes 
do not initially have any type of common parameterization 
or alignment. This is significantly different from other do¬ 
mains, such as natural language processing or vision, where 
text or image datasets frequently come with a common pa¬ 
rameterization beforehand (e.g., images with the same num¬ 
ber of pixels and objects of consistent orientation). 

To achieve a common parameterization of the input 
shapes and scenes, one popular approach is to embed them 
in a common geometric feature space. For this purpose a va¬ 
riety of shape descriptors have been developed. These de¬ 
scriptors can be classified into two main categories: global 
shape descriptors that convert each shape to a feature vec¬ 
tor, and local shape descriptors that convert each point to 
a feature vector. Examples of global shape descriptors are 
Extended Gaussian Images [Hor84], 3D shape histograms 
[AKKS99, CKlOa], spherical functions [SVOl], lightfield 
descriptors [CTSO03], shape distributions [OECD02], sym¬ 
metry descriptors [KER04], spherical harmonics [KER03], 
3D Zernicke moments [NK03], and bags-of-words cre¬ 
ated out of local descriptors [BBOGll]. Local shape de¬ 
scriptors include surface curvature, PCA descriptors, local 
shape diameter [SSCO08], shape contexts [BMP02,KHSIO, 
KB LB 12], spin images [JH99], geodesic distance features 
[ZMT05], heat-kernel descriptors [BBOGll], and depth fea¬ 
tures [SEC* 11]. Global shape descriptors are particularly 
useful for shape classification, retrieval and organization. 
Local shape descriptors are useful for partial shape match¬ 
ing, segmentation, and point correspondence estimation. Be¬ 
fore using any type of global or local descriptor, it is im¬ 
portant to consider whether the descriptor will be invari¬ 
ant to different shape orientations, scales, or poses. In the 
presence of noise and irregular mesh tessellations, it is im¬ 
portant to robustly estimate local descriptors, since surface 
derivatives are particularly susceptible to surface and sam¬ 
pling noise [KSNS07]. 

Sometimes it is common to use several different descrip¬ 
tors, and let the learning step decide which ones are more 
relevant for each class of shapes [KHSIO]. A promising fu- 
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ture direction is to develop data-driven methods that learn 
feature representations from raw 3D geometric data, enlight¬ 
ened by the recent hot topic of deep learning [Ben09]. Sim¬ 
ilar direction is already explored in computer vision for 2D 
images [YNIO]. In 3D, some works attempt feature learning 
on the volumetric representation of 3D shapes or essentially 
3D images [LBF13]. A more popular approach is to apply 
deep learning directly on the raw RGB-D data captured by a 
depth camera [SHB*12,BSWR12,BRF14]. 

Instead of embedding shapes in a common geometric 
feature space, several methods instead try to directly align 
shapes in Euclidean space. We refer the reader to the survey 
on dynamic geometry processing for a tutorial on rigid and 
non-rigid registration techniques [CLM*12]. An interesting 
extension of these techniques is to include the alignment 
process in the learning step of data-driven methods, since 
it is inter-dependent with other shape analysis tasks such as 
shape segmentation and correspondences [KLM*13]. 

Some data-driven methods require additional processing 
steps on the input. For example, learning deformation han¬ 
dles or fully generative models of shapes usually rely on 
segmenting the input shapes into parts with automatic algo¬ 
rithms [HKGl l,SvKK* 11] and representing these parts with 
surface abstractions [YK12] or descriptors [KCKK12]. To 
decrease the amount of computation required during learn¬ 
ing, it is also common to represent the shapes as a set of 
patches (super-faces) [HKGll] inspired by the computation 
of super-pixels in image segmentation. 

2.3. Learning and Inference 

The processed representations of shapes and scenes are used 
to perform learning and inference for a variety of applica¬ 
tions: shape classification, segmentation, matching, recon¬ 
struction, modeling, synthesis, scene analysis and synthe¬ 
sis. The learning procedures significantly vary depending on 
the application, thus we discuss them individually in each 
of the following sections on these applications. As a com¬ 
mon theme, learning is viewed as an optimization problem 
that runs on a set of variables representing geometric, struc¬ 
tural, semantic or functional properties of shapes and scenes. 
There is usually a single or multiple objective (or loss) func¬ 
tions for quantifying preferences for different models or pat¬ 
terns governing the 3D data. After learning a model from the 
training data, inference procedures are used to predict values 
of variables for new shapes or scenes. Again, the inference 
procedures vary depending on the application, and are dis¬ 
cussed separately in the following sections. It is common 
that inference itself is an optimization problem, and some¬ 
times is part of the learning process when there are latent 
variables or partially observed input shape or scene data. 

A general classification of the different types of algo¬ 
rithms used in data-driven approaches for shape and scene 
processing can be derived from the type of input informa¬ 
tion available during learning: 

• Supervised learning algorithms are trained on a set of 

shapes or scenes annotated with labeled data. For exam¬ 
ple, in the case of shape classification, these labeled data 


can have the form of tags, while in the case of segmen¬ 
tation, the labeled data have the form of segmentation 
boundaries or part labels. The labeled data can be pro¬ 
vided by humans or generated synthetically. After learn¬ 
ing, the learned models are applied on different sets of 
shapes (test shapes) to produce results relevant to the task. 

• Unsupervised algorithms co-analyze the input shapes or 
scenes without any additional labeled data i.e., the desired 
output is unknown beforehand. The goal of these methods 
is to discover correlations in the geometry and structure of 
the input shape or scene data. For example, unsupervised 
shape segmentation methods usually perform some type 
of clustering in the feature space of points or patches be¬ 
longing to the input shapes. 

• Semi-supervised algorithms make use of shapes (or 
scenes) with and without any labeled data. Active learn¬ 
ing is a special case of semi-supervised learning in which 
a learning algorithm interactively queries the user to ob¬ 
tain desired outputs for more data points related to shapes. 

In general, supervised methods tend to output results that 
are closer to what a human would expect given the provided 
labeled data, however they may fail to produce desirable re¬ 
sults when the training shapes (or scenes) are largely geo¬ 
metrically and structurally dissimilar with the test shapes (or 
scenes). They also tend to require a substantial amount of 
labeled information as input, which can become a signif¬ 
icant burden for the user. Unsupervised methods can deal 
with collections of shapes and scenes with larger variabil¬ 
ity and require no human supervision. However, they some¬ 
times require parameter tuning to yield the desired results. 
Semi-supervised methods represent a trade-off between su¬ 
pervised and unsupervised methods: they provide more di¬ 
rect control to the user about the desired result compared to 
unsupervised methods, and often produce considerable im¬ 
provements in the results by making use of both labeled and 
unlabeled shapes or scenes compared to supervised methods. 

The data-driven loop. An advantageous feature of data- 
driven shape processing is that the output data, produced 
by learning and inference, typically come with rich seman¬ 
tic information. For example, data-driven shape segmenta¬ 
tion produces parts with semantic labels [KHSIO]; data- 
driven reconstruction is commonly coupled with semantic 
part or shape recognition [SFCH12, NXS12]; data-driven 
shape modeling can generate readily usable shapes inherit¬ 
ing the semantic information from the input data [XZZ* 11]. 
These processed and generated data can be used to enrich 
the existing shape collections with both training labels and 
reusable contents, which in turn benefit subsequent learn¬ 
ing. In a sense, data-driven methods close the loop of data 
generation and data analysis for 3D shapes and scenes; see 
Figure 2. Such concept has been practiced in several prior 
works, such as the data-driven shape reconstruction frame¬ 
work proposed in [PMG*05] (Figure 11). 

Pipeline example. To help the reader grasp the pipeline of 
data-driven methods, a schematic overview of the compo¬ 
nents in Figure 2. Depending on the particular application, 
the pipeline can have several variations, or some compo- 
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nents might be skipped. We discuss the main components 
and steps of algorithms for each application in more detail in 
the following sections. A didactic example of the pipeline in 
the case of supervised shape segmentation is shown in Figure 
3. The input shapes are annotated with labeled part informa¬ 
tion. A geometric descriptor is extracted for each point on 
the training shapes, and the points are embedded in a com¬ 
mon feature space. The learning step uses a classification 
algorithm that non-linearly separates the input space into a 
set of regions corresponding to part labels in order to opti¬ 
mize classification performance (more details are provided 
in Section 4). Given a test shape, a probabilistic model is 
used to infer part labels for each point on that shape based 
on its geometric descriptor in the feature space. 


2.4. A comparative overview 

Before reviewing the related works in detail under various 
applications, we provide a comparative overview of the en¬ 
tire body of works to be reviewed in this survey (see Ta¬ 
ble 4), to correlate these methods under a set of criteria for 
data-driven approach to shape analysis and processing: 

• Training data. We concern about the representation, pre¬ 
processing and scale of training data. Note that once a 
model is learned from the training data, it can be used 
to inference on test data of different modality. For sin¬ 
gle shapes, the mostly adopted representations are mesh 
model and point cloud. 3D scenes are typically repre¬ 
sented as an arrangement of individual objects (mesh 
model). Pre-processing include pre-segmentation, over¬ 
segmentation, pre-alignment, initial correspondence, and 
labeling. 

• Feature. Roughly speaking, there are two types of fea¬ 
tures involved in data-driven shape processing. The most 
commonly used features are low-level ones, such as local 
geometric features (e.g., local curvature) and global shape 
descriptor (e.g. shape distribution [OFCD02]). If the in¬ 
put shapes are pre-segmented into meaningful parts, high- 
level structural features (spatial relationship) can be de¬ 
rived. Generally, working with high-level features enables 
the learning of more powerful models for more advanced 
inference tasks, such as structural analysis [MWZ* 14], on 
more complex data such as man-made objects and scenes. 

• Learning model/approach. The specific choice of learn¬ 
ing method is application-dependent. In most cases, ma¬ 
chine learning approaches are adapted to geometric data 
with feature extraction. For some problems, such as shape 
correspondence, the core problem is to extract geometric 
correlation between different shapes, in an unsupervised 
manner, which itself can be seen as a learning problem 
specific to geometry processing. 

• Learning type. As discussed above, there are three ba¬ 
sic types of data-driven approaches, depending on the 
availability of labeled training data: supervised, semi- 
supervised and unsupervised. 

• Learning outcome. The learning would produce a para¬ 
metric or non-parametric model (classifier, clustering, re¬ 
gressor, etc.) used for inference, a learned distance metric 


Chairs-with-arms Club Swivel Rex 

Figure 4: Fine-grained classification of 3D models 
[HSG13], where text labels are propagated from brown to 
blue models. 

which can be utilized for further analysis, and/or feature 
representations learned from raw data. 

• Application. The main applications of data-driven shape 
analysis and processing are: classification, segmentation, 
correspondence, modeling, synthesis, reconstruction, ex¬ 
ploration and organization. 
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3. Shape Classification 

Data-driven techniques commonly make assumptions about 
the size and homogeneity of the input data set. In par¬ 
ticular, existing analysis techniques often assume that all 
models belong to the same class of objects [KLM*13] or 
scenes [FSHll], and cannot directly scale to entire reposi¬ 
tory such as the Trimble 3D Warehouse [Tril4]. Similarly, 
techniques for data-driven reconstruction of indoor envi¬ 
ronments assume that the input data set only has furni¬ 
ture models [NXS12], while modeling and synthesis inter¬ 
faces restrict the input data to particular object or scene 
classes [CKGK11,KCKK12,FRS*12]. Thus, as a first step 
these methods query a 3D model repository to retrieve a sub¬ 
set of relevant models. 

Most public shape repositories such as 3D Ware¬ 
house [Tril4] rely on the users to provide tags and names 
of the shapes with little additional quality control measures. 
As a result, the shapes are sparsely labeled with inconsistent 
and noisy tags. This motivates developing automatic algo¬ 
rithms to infer text associated with models. Existing work fo¬ 
cuses on establishing class memberships for an entire shape 
(e.g. this shape is a chair), as well as inferring finer-scale 
attributes (e.g. this chair has a rocking leg). 

Classification methods assign a class membership for unla¬ 
beled shapes. One approach is to retrieve for each unlabeled 
shape the most similar shape from a database of 3D mod¬ 
els with known shape classes. There has been a large num¬ 
ber of shape descriptors proposed in recent years that can be 
used in such a retrieval task, and one can refer to the sur¬ 
vey of Tangelder et al. [TV08] for a thorough overview. One 
can further improve classification results by leveraging ma¬ 
chine learning techniques to learn classifiers that are based 
on global shape descriptors [FHK*04,GKF09]. Barutcuoglu 
et al. [BD06] demonstrate that Bayesian aggregation can be 
used to improve classification of shapes that are a part of a 
hierarchical ontology of objects. Bronstein et al. [BBOGll] 
leverage “bag of features” to learn powerful descriptor-space 
metrics for non-rigid shapes. These technique can be further 
improved by using sparse coding techniques [LBBC14]. 
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Figure 5: Ranking of parts with respect to “dangerous 
attribute (image from [ CKG* 13]) 


Tag attributes often capture fine-scale attributes of shapes 
that belong to the same class. These attributes can include 
presence or absence of particular parts, object style, or 
comparative adjectives. Huang et al. [HSG13] developed a 
framework for propagating these attributes in a collection 
of partially annotated 3D models. For example, only brown 
models in Figure 4 were labeled, and blue models were an¬ 
notated automatically. To achieve automatic labeling, they 
start by co-aligning all models to a canonical domain, and 
generate a voxel grid around the co-aligned models. For each 
voxel they compute local shape features, such as spin im¬ 
ages, for each shape. Then, they learn a distance metric that 
best discriminates between different tags. All shapes are fi¬ 
nally embedded in a weighted feature space where nearest 
neighbors are connected in a graph. A graph cut clustering 
is used to assign tags to unlabeled shapes. 

While above method works well for discrete tags, it does 
not capture more continuous relations, such as animal A is 
more dangerous than animal B. Chaudhuri et al. [CKG*13] 
focus on estimating ranking based on comparative adjec¬ 
tives. They ask people to compare pairs of shape parts with 
respect to different adjectives, and use a Support Vector 
Machine ranking method to predict attribute strengths from 
shape features for novel shapes (Figure 5). 

While the techniques described above are suitable for re¬ 
trieving related models, most of the described method are 
not designed to understand intra-class variations. Usually 
a more involved structural analysis is necessary to under¬ 
stand higher-level semantic properties of shapes. Even for 
inferring tag attributes existing works relies on shape match¬ 
ing [HSG13] or shape segmentation [CKG*13]. The follow¬ 
ing two sections will focus on inferring these higher-level 
structural properties in collections of shapes. 


4. Data-driven Shape Segmentation 

The goal of data-driven shape segmentation is to partition 
the shapes of an input collection into parts, and also esti¬ 
mate part correspondences across these shapes. We orga¬ 
nize the literature on shape segmentation into the follow¬ 
ing three categories: supervised segmentation, unsupervised 
segmentation, and semi-supervised segmentation following 
the main classification discussed in Section 2. Table 1 sum¬ 
marizes representative techniques and reports their segmen- 
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Figure 6: A random forest classifier applied on depth data 
representing a human body shape (image from [FGG"^ 13]) 

tation and part labeling performance based on established 
benchmarks. Table 2 reports characteristic running times for 
the same techniques. 


4.1. Supervised shape segmentation 

Classification techniques. Supervised shape segmentation 
is frequently formulated as a classification problem. Given a 
training set of shapes containing points, faces or patches that 
are labeled according to a part category (see Figure 3), the 
goal of a classifier is to identify which part category other 
points, faces, or patches from different shapes belong to. Su¬ 
pervised shape segmentation is executed in two steps: during 
the first step, the parameters of the classifier are learned from 
the training data. During the second step, the classifier is ap¬ 
plied on new shapes. A simple linear classifier has the form: 


c=fCL^j-xj) ( 1 ) 

j 

where Xj is a geometric feature of a point (face, or patch), 
such as the ones discussed in Section 2. The parameters 0^ 
serve as weights for each geometric feature. The function / 
is non-linear and maps to a discrete value (label), which is 
a part category, or to probabilities per category. In general, 
choosing a good set of geometric features that help predict¬ 
ing part labels, and employing classifiers that can discrim¬ 
inate the input data points correctly are important design 
choices. There is no rule of thumb on which is the best clas¬ 
sifier for a problem. This depends on the underlying distribu¬ 
tion and characteristics of the input geometric features, their 
dimensionality, amount of labeled data, existence of noise in 
the labeled data or shapes, training and test time constraints 
- for a related discussion on how to choose a classifier for a 
problem, we refer the reader to [MRS08]. Due to the large 
dimensionality and complexity of geometric feature spaces, 
non-linear classifiers are more commonly used. For exam¬ 
ple, to segment human bodies into parts and recognize poses, 
the Microsoft’s Kinect uses a random forest classifier trained 
on synthetic depth images of humans of many shapes and 
sizes in highly varied poses sampled from a large motion 
capture database [SFC*11] (Figure 6). 

Structured models. For computer graphics applications, it 
is important to segment shapes with accurate and smooth 
boundaries. For example, to help the user create a new shape 
by re-combining parts from other shapes [FKS*04], irregu¬ 
lar and noisy segmentation boundaries can cause problems 
in the part attachment. From this aspect, using a classifier 
per point/face independently is usually not enough. Thus, it 
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Segmentation 

method 

Learning 

type 

Type of 
manual input 

PSB rand index (# train, 
shapes if applicable) 

L-PSB accuracy (# train, 
shapes if applicable) 

COSEG 

accuracy 

[KHSIO] 

supervised 

labeled shapes 

9.4% (19) /14.8% (3) 

95.3% (19) / 89.2% (3) 

unknown 

[BLVDll] 

supervised 

segmented shapes 

8.8% (19)/9.7% (6) 

not applicable 

not applicable 

[HKGll] 

unsupervised 

none 

10.1% 

not applicable 

not applicable 

[SvKK*ll] 

unsupervised 

none 

unknown 

unknown 

87.7% 

[vKTS*ll] 

supervised 

labeled shapes 

unknown 

88.7% (12), see caption 

unknown 

[HFL12] 

unsupervised 

none 

unknown 

88.5% 

91.4% 

[LCHB12] 

semi-supervised 

labeled shapes 

unknown 

92.3% (3) 

unknown 

[WAvK*12] 

semi-supervised 

link constraints 

unknown 

unknown 

‘close to error-free’ 

[WGW*13] 

supervised 

labeled images 

unknown 

88.0% (19), see caption 

unknown 

[KLM*13] 

semi-/unsupervised 

box templates 

unknown 

unknown 

92.7% (semi-superv.) 

[HWG14] 

unsupervised 

none 

unknown 

unknown 

90.1% 

[XSX*14] 

supervised 

labeled shapes 

10.0% 

86.0% 

unknown 

[XXLX14] 

supervised 

labeled shapes 

10.2% (19) 

94.2 (19) / 88.6 (5) 

unknown 


Table 1: Performance of data-driven methods for segmentation in the Princeton Segmentation Benchmark (PSB) and COSEG 
datasets. Left to right: segmentation method, learning type depending on the nature of data required as input to the method, type 
of manual input if such required, segmentation performance expressed by the rand index metric [CGF09], labeling accuracy 
[KHSIO] based on the PSB and COSEG datasets. We report the rand index segmentation error metric averaged over all 
classes of the PSB benchmark. The labeling accuracy is averaged over the Labeled PSB (L-PSB) benchmark excluding the 
“Bust”, “Mech”, and “Bearing” classes. The reason is that there are no clear semantic correspondences between parts in 
these classes, or the ground-truth segmentations do not sufficiently capture semantic parts in their shapes. We report the labeling 
accuracy averaged over the categories of the COSEG dataset used in [SvKK* 11 ]. The COSEG classes “iron”, “large chairs”, 
“large vases”, “tele-aliens” were added later and are excluded here since most papers frequently do not report performance 
in those. We note that van Kaick et al. [vKTS^ 11 ] reported the labeling accuracy in ten of the L-PSB classes, while Wang et 
al. [WGW* 13] reported the labeling accuracy in seven of the L-PSB classes. The method by Kim et al. [KLM* 13] can run 
in either semi-supervised or unsupervised mode. In unsupervised mode, the corresponding labeling accuracy is 89.9% in the 
COSEG dataset on average. 


is more common to formulate the shape segmentation prob¬ 
lem as an energy minimization problem that involves a unary 
term assessing the consistency of each point/face with each 
part label, as well as a pairwise term assessing the consis¬ 
tency of neighboring points/faces with pairs of labels. For 
example, pairs of points that have low curvature (i.e., are 
on flat surface) are more likely to have the same part la¬ 
bel. This energy minimization formulation has been used 
in several single-shape and data-driven segmentations (un¬ 
supervised or supervised) [KT03,ATC*05,SSS*ss,KHSIO]. 
In the case of supervised segmentation [KHSIO], the energy 
can be written as: 

£'(c;0) = y^. Eunary {Cj ; X/, 9 ^ ) -f <^75 y/y? ^2) 

i iJ 

( 2 ) 

where c = {c/} is a vector of random variables representing 
the part label per point (or face) /, x/ is its geometric feature 
vector, /, j are indices to points (or faces) that are consid¬ 
ered neighbors, jij is a geometric feature vector represent¬ 
ing dihedral angle, angle between normals, or other features, 
and 0 = {01,02} are the energy parameters. The important 
difference of supervised data-driven methods with previous 
single-shape segmentation methods is that the parameters 0 
are automatically learned from the training shapes to capture 
complex feature space patterns per part [ATC*05, KHSIO]. 
We also note that the above energy of Equation 2, when writ¬ 
ten in an exponentiated form and normalized, can be treated 
as a probabilistic graphical model [KF09], called Condi¬ 


tional Random Field [LMPOl] that represents the joint prob¬ 
ability distribution over part labels conditioned on the input 
features: 

P(c|x,y,e) = exp(-£'(c;e))/Z(x,y,9) (3) 

where Z(x, y, 0) is a normalization factor, also known as par¬ 
tition function. Minimizing the energy of Equation 2, or cor¬ 
respondingly finding the assignment c that maximizes the 
above probability distribution is known as a Maximum A 
Posteriori inference problem that can be solved in various 
manners, such as graph cuts, belief propagation, variational 
or linear programming relaxation techniques [KF09]. 

The parameters 0 can be jointly learned through maxi¬ 
mum likelihood (ML) or maximum a posteriori (MAP) es¬ 
timates [KF09]. However, due to high computational com¬ 
plexity of ML or MAP learning and the non-linearity 
of classifiers used in shape segmentation, it is common 
to train the parameters 0i and 02 of the model sepa¬ 
rately i.e., train the classifiers of the unary and pairwise 
term separately [SM05]. The exact form of the unary and 
pairwise terms vary across supervised shape segmentation 
methods: the unary term can have the form of a log- 
linear model [ATC*05], cascade of JointBoost classifiers 
[KHSIO], Gentleboost [vKTS*ll], or feedforward neural 
networks [XXLX14]. The pairwise term can have the form 
of a learned log-linear model [ATC*05], label-dependent 
GentleBoost classifier [KHSIO], or a smoothness term based 
on dihedral angles and edge length tuned by experimenta- 
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tion [SSS*ss, vKTS*ll,XXLX14]. Again the form of the 
unary and pairwise terms depend on the amount of training 
data, dimensionality and underlying distribution of geomet¬ 
ric features used, and computational cost. 

Joint labeling. Instead of applying the learned probabilis¬ 
tic model to a single shape, an alternative approach is to find 
correspondences between faces of pairs of shapes, and incor¬ 
porate a third “inter-shape” term in the energy of Equation 
2 [vKTS*ll]. The “inter-shape” term favors pairs of corre¬ 
sponding faces on different shapes to have the same label. 
As a result, the energy can be minimized jointly over a set of 
shapes to take into account any additional correspondences. 

Boundary learning. Instead of applying a classifier per 
mesh point, face or patch to predict a part label, a different 
approach is to predict the probability of each polygon mesh 
edge to serve as a segmentation boundary or not [BLVDl 1]. 
The problem can be formulated as a binary classifier (e.g., 
Adaboost) that is trained from human segmentation bound¬ 
aries. The input to the classifier are geometric features of 
edges, such as dihedral angles, curvature, and shape diam¬ 
eter and the output is a probability for an edge to be a seg¬ 
mentation boundary. Since the predicted probabilities over 
the mesh does not correspond to closed smooth boundaries, 
a thinning and an active contour model [KWT88] are used 
as post-processing to produce the final segmentations. 

Transductive segmentation. Another way to formulate the 
shape segmentation problem is to group patches on a mesh 
such that the segment similarity is maximized between the 
resulting segments and the provided segments in the training 
database. The segment similarity can be measured as the re¬ 
construction cost of the resulting segment from the training 
ones. The grouping of patches can be solved as an integer 
programming problem [XSX*14]. 

Shape segmentation from labeled images. Instead of us¬ 
ing labeled training shapes for supervised shape segmen¬ 
tation, an alternative source of training data can come in 
the form of segmented and labeled images, as demonstrated 
by Wang et al. [WGW*13]. Given an input 3D shape, this 
method first renders 2D binary images of it from differ¬ 
ent viewpoints. Each binary image is used to retrieve mul¬ 
tiple training segmented and labeled images from an input 
database based on a bi-class Hausdorff distance measure. 
Each retrieved image is used to perform label transfer to 
the 2D shape projections. All labeled projections are then 
back-projected onto the input 3D model to compute a label¬ 
ing probability map. The energy function for segmentation is 
formulated by using this probability map in the unary term 
expressed per face or point, while dihedral angles and Eu¬ 
clidean distances are used in the pairwise term. 

4.2. Semi-supervised shape segmentation 

Entropy regularization. The parameters 0 of Equation 2 
can be learned not only from the training labeled shapes, but 
also from the unlabeled shapes [LCHB12]. The idea is that 


learning should maximize the likelihood function of the pa¬ 
rameters over the labeled shapes, and also minimize the en¬ 
tropy (uncertainty) of the classifier over the unlabeled shapes 
(or correspondingly maximize the negative entropy). The 
idea is that minimizing the entropy over unlabeled shapes 
encourages the algorithm to find putative labelings for the 
unlabeled data [JWL*06]. However, it is generally hard to 
strike a balance between the likelihood and entropy terms. 

Metric embedding and active learning. A more gen¬ 
eral formulation for semi-supervised segmentation was pre¬ 
sented in [WAvK* 12]. Starting from a set of shapes that are 
co-segmented in an unsupervised manner [SvKK*ll], the 
user interactively adds two types of constraints: “must-link” 
constraints, which specify that two patches (super-faces) 
should belong to the same cluster, and “cannot-link” con¬ 
straints which specify that two patches must be in different 
clusters. These constraints are used to perform constrained 
clustering in an embedded feature space of super-faces com¬ 
ing from all the shapes of the input dataset. The key idea is 
to transform the original feature space, such that super-faces 
with “must-link” constraints come closer together to form 
a cluster in the embedded feature space, while super-faces 
with “cannot-link” constraints move away from each other. 
To minimize the effort required from the user, the method 
suggests the user pairs of points in feature space that when 
constrained are likely to improve the co-segmentation. The 
suggestions involve points that are far from their cluster cen¬ 
ters, and have a low confidence of belonging to their clusters. 

Template fitting. A different form of partial supervision 
can come in the form of part-based templates. Kim et al.’s 
method [KLM*13] allows users to specify or refine a few 
templates made out of boxes representing expected parts in 
an input database. The boxes iteratively fit to the shapes of 
a collection through simultaneous alignment, surface seg¬ 
mentation and point-to-point correspondences estimated be¬ 
tween each template and each input shape. Alternatively, the 
templates can be inferred automatically from the shapes of 
the input collection without human supervision based on sin¬ 
gle shape segmentation heuristics. Optionally, the user can 
refine and improve these estimated templates. Erom this as¬ 
pect, Kim et al.’s method can run in either semi-supervised or 
unsupervised method. It was also the first method to handle 
segmentation and correspondences in collections with size 
in the order of thousands of shapes. 

4.3. Unsupervised segmentation 

Unsupervised data-driven shape segmentation techniques 
fall into two categories: clustering based techniques and 
matching based techniques. In the following, we highlight 
the key idea of each type of approaches. 

Clustering based techniques are adapted from supervised 
techniques. They compute feature descriptors on points or 
faces. Clustering is performed over all points/faces over all 
shapes. Each resulting cluster indicates a consistent segment 
across the input shapes. The promise of the clustering based 


submitted to COMPUTER GRAPHICS Forum (2/2015). 


10 


K. Xu &V. Kim & Q. Huang & E. Kalogerakis /Data-Driven Shape Analysis and Processing 


Segmentation 

method 

Reported 
running times 

Dataset size for 
reported running times 

Reported 

processor 

[KHSIO] 

8h train. / 5 min test. 

6 train, shapes /1 test shape 

Intel Xeon E5355 2.66GHz 

[BLVDll] 

10 min train. / 1 min test. 

unknown for train. / 1 test shape 

Intel Core 2 Duo 2.99GHz 

[HKGll] 

32h 

380 shapes 

unknown, 2.4 GHz 

[SvKK*ll] 

10 min 

30 shapes 

AMD Opteron 2.4GHz 

[vKTS*ll] 

lOh train. / few min test. 

20-30 train, shapes /1 test shape 

AMD Opteron IGHz 

[HFL12] 

8 min (excl. feat, extr.) 

20 shapes 

Intel dual-core 2.93GHz 

[LCHB12] 

7h train. / few min test. 

20 shapes 

Intel 17 2600 3.4GHz 

[WAvK*12] 

7 min user interaction 

28 shapes 

unknown 

[WGW*13] 

1.5 min (no train, step) 

1 test shape 

unknown 

[KLM*13] 

llh 

7442 shapes 

unknown 

[HWG14] 

33h 

8401 shapes 

unknown, 3.2GHZ 

[XSX*14] 

30 sec (no train, step) 

1 test shape 

Intel 15 CPU 

[XXLX14] 

15 sec train, (excl. feat, extr.) 

6 train, shapes 

Intel Quad-Core 3.2 GHz 


Table 2: Running times reported for the data-driven segmentation methods of Table 1. We note that running times are reported 
in different dataset sizes and processors in the referenced papers, while it is frequently not specified whether the execution uses 
one or multiple threads or whether the running times include all the algorithm steps, such as super-face or feature extraction. 
Exact processor information is also frequently not provided. Thus, the reported running times of this table are only indicative 
and should not serve as a basis for a fair comparison. 


approach is that when the number of shapes becomes large, 
the sampling density in the clustering space becomes dense 
enough, so that certain statistical assumptions are satisfied, 
e.g., diffusion distances between points from different clus¬ 
ters is significantly larger than those between points within 
each cluster. When these assumptions are satisfied, cluster¬ 
ing based approach can produce results that are comparable 
to supervised techniques (c.f. [HFL12]). In addition, clus¬ 
tering method being employed play an important role in 
the segmentation results. In [SvKK*ll], the authors utilize 
spectral clustering to perform clustering. In [HFL12], the au¬ 
thors employ subspace clustering, a more advanced cluster¬ 
ing method, to obtain improved results. 

Another line of unsupervised methods pursues cluster¬ 
ing of parts. In [XLZ*10], the authors perform co-analysis 
over a set of shapes via factoring out the part scale vari¬ 
ation by grouping the shapes into different styles, where 
style is defined by the anisotropic part scales of the shapes. 
In [vKXZ*13], the authors introduce unsupervised co- 
hierarchical analysis of a set of shapes. They propose a novel 
cluster-and-select scheme for selecting representative part 
hierarchies for all shapes and grouping the shapes accord¬ 
ing to the hierarchies. The method can be used to compute 
consistent hierarchical segmentation for the input set. 

Matching based methods [GF09, HKGll, WHG13, 
HWG14] build maps across shapes and utilize these maps 
to achieve consistency of segmentations. As shown in 
Figure 7, this strategy allows us to identify meaningful parts 
despite the lack of strong geometric cues on a particular 
shape. Likewise, the approach is able to identify coherent 
single parts even when the geometry of the individual shape 
suggests the presence of multiple segments. A challenge 
here is to find a suitable shape representation so that 
maps across diverse shapes are well-defined. In [HKGll], 
Huang et al. introduce an optimization strategy that jointly 






Figure 7: Comparison of single-shape segmentation (left) 
and joint shape segmentation (right) on models from the 
PSB benchmark [CGF09]. Each segmentation on the left 
was produced by the top-performing algorithm in the bench¬ 
mark for that shape. The segmentations on the right were 
produced by [HKGll], which jointly optimized segmenta¬ 
tions and correspondences across the entire dataset. 


optimizes shape segmentations and maps between opti¬ 
mized segmentations. Since the maps are defined at the 
part-level, this technique is suitable for heterogeneous 
shape collections. Experimentally, it generates comparable 
results with supervised method [KHSIO] on the Princeton 
segmentation benchmark. Recently, Huang et al. [HWG14] 
formulates the same idea under the framework of functional 
maps [OBCS*12] and gain improved segmentation quality 
and computational efficiency. 


5. Joint Shape Matching 

Another fundamental problem in shape analysis is shape 
matching, which finds relations or maps between shapes. 
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These maps allow us to transfer information across shapes 
and aggregate information from a collection of shapes for 
a better understanding of individual shapes (e.g., detecting 
shared structures such as skeletons or shape parts). They also 
provide a powerful platform for comparing shapes (i.e., with 
respect to different measures and at difference places). As 
we can see from other sections, shape maps are widely ap¬ 
plied in shape classification and shape exploration as well. 

So far most existing research in shape matching has fo¬ 
cused on matching pairs of shapes in isolation. We re¬ 
fer to [vKZHCOll] for a survey and to [LH05, LF09, 
vKZHCOll, OMMGIO, KLFll, OBCS*12] for recent ad¬ 
vances. Although significant progress has been made, state- 
of-the-art techniques are limited to shapes that similar to 
each other. On the other hand, they tend to be insufficient for 
shapes that undergo large geometric and topological varia¬ 
tions. 

The availability of large shape collections offers opportu¬ 
nities to address this issue. Intuitively, when matching two 
dissimilar shapes, we may utilize intermediate shapes to 
transfer maps. In other words, we can build maps between 
similar shapes, and use the composite maps to obtain maps 
between less similar shapes. As we will see shortly, this in¬ 
tuition can be generalized to enforcing a cycle-consistency 
constraint, namely composite maps along cycles should be 
identity map or the composite map between two shapes is 
path-independent. In this section, we discuss joint shape 
matching techniques that take a shape collection and initial 
noisy maps computed between pairs of shapes as input, and 
output improved maps across the shape collection. 

5.1. Model Graph and Cycle-Consistency 

To formulate the joint matching problem, we consider a 
model graph Q = {S,8) (c.f. [Hub02]). The vertex set S = 
{5i, • • • ,^n)} consists of the input shapes. The edge set S 
characterizes the pairs of shapes that are selected for per¬ 
forming pair-wise matching. For small-scale datasets, we 
typically match all pairs of shapes. For large-scale datasets, 
the edge set usually connects shapes that are similar accord¬ 
ing to a pre-defined shape descriptor [KLM* 12, HSG13], 
thus generating a sparse shape graph. 

The key component of a joint matching algorithm is to uti¬ 
lize the so-called cycle-consistency constraint. Specifically 
speaking, if all the maps in Q are correct, then composite 
maps along any loops should be identity maps. This is true 
for maps that are represented as transformations (e.g., ro¬ 
tations and rigid/affine transformations), or full point-wise 
maps that can be described as permutation matrices). We can 
easily modify the constraint to handle partial maps, namely 
each point, when transformed along a loop, either disappears 
or goes back to the original point (See [HWG14] for details). 

The cycle-consistency constraint is useful because the ini¬ 
tial maps, which are computed between pairs of shapes in 
isolation, are not expected to satisfy the cycle consistency 
constraint. On the other hand, although we do not know 
which maps or correspondences are incorrect, we can detect 


Initial maps 



Optimized maps 



Figure 8: Joint shape matching takes as input maps com¬ 
puted between pairs of shapes in isolation and utilizes the 
cycle-consistency constraint to improve shape maps. This 
figure shows the result of Huang etal [HWG14], which per¬ 
forms joint shape matching under the functional map setting. 

inconsistent cycles. These inconsistent cycles provide use¬ 
ful information for us to detect incorrect correspondences or 
maps, i.e., an inconsistent cycle indicates that at least one 
of the participating maps or correspondences is incorrect. To 
turn this observation into algorithms, one has to formulate 
the cycle-consistency constraint properly. Existing works in 
data-driven shape matching fall into two categories: combi¬ 
natorial techniques and matrix recovery based techniques. 
The reminder of this section provides the details. 

5.2. Combinatorial Techniques 

Spanning tree optimization. Earlier works in joint match¬ 
ing aim at finding a spanning tree in the model graph. 
In [GMB04,HEG*06], the authors propose to use the max¬ 
imum spanning tree (MST) of the model graph. However, 
this strategy can easily fail since a single incorrect edge in 
the MST may break the entire matching result. In the semi¬ 
nal work [Hub02], Huber showed that finding the best span¬ 
ning tree maximizing the number of consistent edges is NP- 
hard. Although finding the best spanning tree is not tractable, 
Huber introduced several local operations for improving the 
score of spanning trees. However, these approaches are gen¬ 
erally limited to small-scale problems so that the search 
space can be sufficiently explored. 

Inconsistent cycle detection. Another line of ap¬ 
proaches [ZKPIO, RSSSll, NBCW*11] applies global 
optimization to select cycle-consistent maps. These ap¬ 
proaches are typically formulated as solving constrained 
optimization problems, where objective functions encode 
the scores of selected maps, and constraints enforce the 
consistency of selected maps along cycles. The major 
advantage of these approaches is that the correct maps are 
determined globally. However, as the cycle consistency 
constraint needs to apportion blame along many edges 
on a cycle, the success of these approaches relies on the 
assumption that correct maps are dominant in the model 
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graph so that the small number of bad maps can be identified 
through their participation in many bad cycles. 

MRF formulation. Joint matching may also be formulated 
as solving a second order Markov Random Field (or MRF) 
[CAF10b,CAF10a,COSHll,HZG*12]. The basic idea is to 
sample the transformation/deformation space of each shape 
to obtain a candidate set of transformation/deformation sam¬ 
ples per shape. Joint matching is then formulated as opti¬ 
mizing the best sample for each shape. The objective func¬ 
tion considers initial maps. Specifically, each pair of sam¬ 
ples from two different shapes would generate a candidate 
map between them. The objective function then formulates 
second-order potentials, where each term characterize the 
alignment score between these candidate maps and the initial 
maps [HSG13,HZG*12]. 

The key challenge in the MRF formulation is generat¬ 
ing the candidate samples for each shape. The most pop¬ 
ular strategy is to perform uniform sampling [COSHll, 
HSG13], which works well when the transformation space 
is low-dimensional. To apply the MRF formulation on high¬ 
dimensional problems, Huang et al. [HZG*12] introduce a 
diffusion-and-sharpening strategy. The idea is to diffuse the 
maps among the model graph to obtain rich samples of can¬ 
didate transformations or correspondences and then perform 
clustering to reduce the number of candidate samples. 

5.3. Matrix Based Techniques 

A recent trend in map computation is to formulate joint 
map computation as inferring matrices [SWll, KLM* 12, 
HZG*12,WS13,HG13,CGH14,HWG14].The basic idea is 
to consider a big map collection matrix 


Xii 

X12 

• Xi„ 

X21 

X22 

• X2„ 

X21 


X/2^ 


where each block X/y encodes the map from shape Si to 
shape Sj. In this matrix representation, the cycle-consistency 
constraint can be equivalently described as simple proper¬ 
ties of X, i.e., depending on the types of maps, X is either 
positive semidefinite or low-rank (c.f. [HG13,HWG14]). In 
addition, we may view the initial pair-wise maps as noisy 
measurements of the entries of X. Based on this perspective, 
we can formulate joint matching as matrix recovery from 
noisy measurements of its entries. 

Spectral techniques. The initial attempts in matrix recovery 
are spectral techniques and their variants [SWll,KLM* 12, 
WHG13]. The basic idea is to consider the map collection 
Xinput encodes initial maps in its blocks. Then the recov¬ 
ered matrix is given by X = ULV^, where U, Z, V are given 
singular value decomposition (or SVD) of Various 

methods have added heuristics on top of this basic proce¬ 
dure. For example, Kim et al. [KLM* 12] use the optimized 
maps to recompute initial maps. 

This SVD strategy can be viewed as matrix recovery be¬ 
cause X is equivalent to the optimal low-rank approximation 


of (with given rank) under the matrix Frobenius norm. 
However, as the input maps may contain outliers, employ¬ 
ing the Frobenius norm for matrix recovery is sub-optimal. 
Moreover, it is hard to analyze these techniques, even in the 
very basic setting where maps are given by permutation ma¬ 
trices [PKS13]. 

Point-based maps. In a series of works, Huang and cowork¬ 
ers [HG13, CGH14, HCG14] consider the case of point- 
based maps and develop joint matching algorithms that ad¬ 
mit theoretical guarantees. The work of [HG13] considers 
the basic setting of permutation matrix maps and proves the 
equivalence between cycle-consistent maps and the low-rank 
or positive semi-definiteness of the map collection matrix. 
This leads to a semidefinite programming formulation for 
joint matching. In particular, LI norm is used to measur¬ 
ing the distance between the recovered maps and the initial 
maps. The authors provide exact recovery conditions, which 
state that the ground-truth maps can be recovered if the per¬ 
centage of incorrect correspondences in the input maps is 
below a constant. In a followup work, Chen et al. [CGH14] 
extends it to partial maps and provide a better analysis in the 
case where incorrect correspondences in the input maps are 
random. The computational issue is addressed in [HCG14], 
which employs alternating direction of multiplier methods 
for optimization. 

Rotations and functional maps. Maps that are represented 
by general matrices (e.g., rotations or functional maps) can 
also be handled in a similar fashion. In [WS13], Wang and 
Singer consider the case of rotations between objects. Their 
formulation is similar to [HG13] but utilize a LI Frobenius 
norm for measuring the distance between initial rotations 
and recovered rotations. Recently, Huang et al. [HWG14] 
extend the idea to functional maps. The major difference be¬ 
tween functional maps and point-based maps or rotations is 
that the map collection matrix is no-longer symmetric. Thus, 
their method is formulated to recover low-rank matrices. 


5.4. Discussion and Future Directions 

The key to a joint shape matching algorithm is to have a 
proper formulation of the cycle-consistency constraint. We 
have witnessed the evolution from earlier works on com¬ 
binatorial search and detecting inconsistent cycles to more 
recent works on spectral techniques, MRF based methods 
and matrix recovery techniques. In particular, matrix recov¬ 
ery techniques admit theoretical guarantees. They provide 
fundamental understanding of why joint shape matching can 
improve from isolated pair-wise matching. 

One future direction is to integrate pair-wise matching and 
joint matching into one optimization problem. Since the ma¬ 
jor role of joint matching is to remove the noise presented 
in pair-wise matching, it makes sense to perform them to¬ 
gether. Such unified approaches have the potential to further 
improve from decomposed approaches (i.e., from pair-wise 
to joint). The technical challenge is to find map representa¬ 
tions so that pair-wise matching and map consistency can be 
formulated in the same framework. 
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Figure 9: Derived from a dataset of prototypical 3D scans 
effaces, the morphable face model contributes to two main 
steps in face manipulation: (1) deriving a 3D face model 
from a novel image, and (2) modifying shape and texture in 
a natural way [BV99]. 

6. Data-Driven Shape Reconstruction 

Reconstructing geometric shapes from physical objects is a 
fundamental problem in geometry processing. The input to 
this problem is usually a point cloud produced by aligned 
range scans, which provides an observation of an object. The 
goal of a shape reconstruction algorithm is to convert this 
point cloud into a high-quality geometric model. In prac¬ 
tice, the input point cloud data is noisy and incomplete, thus 
the key to a successful shape reconstruction algorithm is for¬ 
mulating appropriate shape priors. Traditional shape recon¬ 
struction algorithms usually utilize generic priors, such as 
surface smoothness [DTB06], and typically assume that the 
input data captures most of the object’s surface. To handle 
higher degree of noise and partiality of the input data, it is 
important to build structural shape priors. 

Data-driven techniques tackle this challenge by leverag¬ 
ing shape collections to learn strong structural priors from 
similar objects, and use them to reconstruct high-quality 3D 
models. Existing approaches fall into two categories, based 
on how they represent the shape priors: parametric and non- 
parametric. The former usually builds a low-dimensional 
parametric representation of the underlying shape space, 
learning the representation from exemplars and enforcing 
the parameterization when reconstructing new models. Para¬ 
metric methods typically require building correspondences 
across the exemplar shapes. In contrast, non-parametric 
methods directly operate on the input shapes by copying 
and deforming existing shapes or shape parts, which are de¬ 
signed for shapes with large variations, such as man-made 
objects. 


6.1. Parametric Methods 

Morphable face. A representative work in parametric 
data-driven shape reconstruction is the morphable face 
model [BV99], which is designed for reconstructing 3D tex¬ 
tured faces from photos and scans. The model is learned 
from a dataset of prototypical 3D shapes of faces, and the 
model can then be used to derive a 3D face model from a 
novel image and to modify shape and texture in a natural 
way (See Figure9). 

In particular, the morphable face model represents the ge- 



Figure 10: Parameterizing the variation in human shapes 
can be used to synthesize new individuals or edit existing 
ones [ACP03]. 

ometry of a face with shape-vector S = (pf,*" ^ 

R^”), that contains the 3D coordinates of its n vertices. Sim¬ 
ilarly, it encodes the texture of a face by a texture-vector 
T = (cf , C 2 , • • ' 5 ^ that contains the RGB color val¬ 

ues of the corresponding vertices. A morphable face model 
is then constructed using a database of m exemplar faces, 
each represented by its shape-vector Sf and 7]. In [BV99] 
the exemplar faces are constructed by matching a template 
to scanned human faces. 


The morphable face model uses Principal Component 
Analysis (PCA) to characterize the shape space. A new 
shape and its associated texture are given by 

_ m—1 m—1 

^ OC/S/, Tj^od~T-)- ^ P/tn 
i=l i=l 


where S and T are the mean-shape and mean-texture, respec¬ 
tively, and Si and if are eigenvectors of covariance matrices, 
a/ and P/ are coefficients. PCA also gives probability distri¬ 
butions over coefficients. The probability for coefficients at 
is given by 


P({«d) ~ exp 



m—1 \ 

E (Ct;/0,)M 


with aj being the eigenvalues of the shape covariant matrix 
Cs (the probability ;7({P/}) is computed in a similar way). 


With this morphable face model, reconstruction of tex¬ 
tured models can be posed as a small-scale non-linear op¬ 
timization problem. For example, given a 2D image of a 
human face /inputs one can reconstruct the underlying tex¬ 
tured 3D model by searching for a similar rendered face 
/({a/}, {P/},/>), parameterized by the shape and texture co¬ 
efficients at and P/, and the rendering parameters p (e.g., 
camera configuration, lighting parameters). The optimiza¬ 
tion problem is formulated as minimizing a data term, which 
measures the distance between the input image and the ren¬ 
dered image, and regularization terms that are learned from 
exemplar faces. The success of the morphable model re¬ 
lies on low-dimensionality of the solution space, thus this 
method was applied to several other data sets where this as¬ 
sumption holds, such as human bodies and poses. 


Morphable human bodies. Allen et al. [ACP03] general¬ 
ize morphable model to characterize human bodies (Figure 
10). Given a set of 250 scanned human bodies, the method 
first performs non-rigid registration to fit a hole-free, artist¬ 
generated mesh (template) to each of these scans. The result 
is a set of mutually consistent parameterized shapes based 
on the corresponding vertex positions originating from the 
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template. Similar to [BV99], the method employs PCA to 
characterize the shape space, which enables applications in 
shape exploration, synthesis and reconstruction. 

In addition to variations in body shapes, human models 
exhibit variations in poses. The SCAPE model (Shape Com¬ 
pletion and Animation for PEople) [ASK*05] addresses this 
challenge by learning separate models of body deformation 
- one accounting for variations in poses and one accounting 
differences in body shapes among humans. The pose defor¬ 
mation component is acquired from a set of dense 3D scans 
of a single person in multiple poses. A key aspect of the pose 
model is that it decomposes deformation into a rigid and a 
non-rigid component. The rigid component is modeled us¬ 
ing a standard skeleton system. The non-rigid component, 
which captures remaining deformations such as flexing of 
the muscles, associates each triangle with a local afflne trans¬ 
formation matrix. These transformation matrices are learned 
from exemplars using a joint regression model. In [HSS*09], 
Hasler et al. introduce a unifled model for parameterizing 
both shapes and poses. The basic idea is to consider the rel¬ 
ative transformations between all pairs of neighboring trian¬ 
gles. These transformation matrices allow us to reconstruct 
the original shape by solving a least square problem. In this 
regard, each shape is encoded as a set of edge-wise trans¬ 
formation matrices, which are fit into the PCA framework to 
obtain a statistical model of human shapes. The model is fur¬ 
ther used to estimate shapes of dressed humans from range 
scans [HSR* 09]. 

Recent works on statistical human shape analysis focus 
on combing learned shape priors with sparse observations 
and special effects. In [TMB14], the authors introduce an ap¬ 
proach that reconstruct high-quality shapes and poses from a 
sparse set of markers. The success of this approach relies on 
learning meaningful shape priors from a database consists of 
thousands of shapes. In [LMB14], the authors study how to 
understand human breathing from acquired data. 

Data-driven tracking. Another problem in shape recon¬ 
struction is object tracking, which aims at creating and an¬ 
alyzing dynamic shapes and/or poses of physical objects. 
Successful tracking techniques (e.g., [WLVGP09, WBLPll, 
LYYB13, CWLZ13, CHZ14]) typically utilize parametric 
shape spaces. These reduced shape spaces provide shape pri¬ 
ors that improve both the efficiency and robustness of the 
tracking process. The way to utilize and construct shape 
spaces vary in different settings, and are typically tailored to 
the speciflc problem setting. Weise et al. [WLVGP09] utilize 
a linear PCA subspace trained with a very large set of pre- 
processed facial expressions. This method requires an ex¬ 
tended training session with a careful choice of facial action 
units. In addition, the learned face model is actor-speciflc. 
These restrictions are partially resolved in [LWPIO], which 
introduces an example-based blendshape optimization tech¬ 
nique, involving only a limited number of random facial ex¬ 
pressions. In [WBLPll], the authors combine both blend- 
shapes and data-driven animation priors to improve the 
tracking performance. In a recent work, Li et al. [LYYB13] 
employs adaptive PCA to further improve the tracking per¬ 
formance on nuanced emotions and micro-expression. The 
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Figure 11: The data-driven shape reconstruction pipeline 
proposed in [PMG^OS]. 

key idea is to combine a general blendshape PCA model and 
a corrective PCA model that is updated on-the-fly. This cor¬ 
rective PCA model captures the details of the speciflc actor 
and missing deformations from the initial blendshape model. 

6.2. Non-Parametric Methods 

Parametric methods require canonical domains to character¬ 
ize the shape space, which have been so far demonstrated 
in domains of organic shapes, such as body shapes or faces. 
In this section, we discuss another category of methods that 
have shown the potential to handle more diverse shape col¬ 
lections. 

Generally speaking, a non-parametric data-driven shape 
reconstruction method utilizes a collection of relevant shapes 
and combines three phases, i.e., a query phase, a transforma¬ 
tion phase and a assembly phase. Existing methods differ 
in how the input shape collection is preprocessed and how 
these phases are performed. 

Example-based scan completion. Pauly et al. [PMG*05] 
introduce one of the first non-parametric systems. As shown 
in [PMG*05], the method takes an input point cloud and a 
collection of complete objects as input. The reconstruction 
procedure reveals all three phases described above. The first 
phase determines a set of similar objects. The retrieval phase 
combines both text-based search, PCA signatures and is re¬ 
fined by rigid alignment. The second step performs non-rigid 
alignment between the retrieved shapes and the input point 
cloud. This step partitions the input point cloud into a set 
of patches, where each patch is associated with one retried 
shape (via the corresponding region). The final phase merges 
the corresponding regions into a unifled shape. 

Nan et al. [NXS12] introduce a similar system for indoor 
scene reconstruction. Given an input point cloud of an in¬ 
door scene that consists of a set of objects with known cat¬ 
egories, the method searches in a database of 3D models to 
And matched objects and then deforms them in a non-rigid 
manner to fit the input point cloud. Note that this method 
treats complete 3D objects as building blocks, so the final 
reconstruction does not necessarily reflect the original scene. 

In contrast to considering entire 3D shapes. Gal et 
al. [GSH*07] utilizes a dictionary of local shape priors (de¬ 
fined as patches) for shape reconstruction. The method is 
mainly designed for enhancing shape features, where each 
region of an input point cloud is matched to a shape patch 
in the database. The matched shape patch is then used to en¬ 
hance and rectify the local region. Recently, Mattausch et 
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al. [MPM* 14] introduce a patch-based reconstruction sys¬ 
tem for indoor scenes. Their method considers recognizing 
and fitting planar patches from point cloud data. 

Shen and coworkers [SFCH12] extends the idea for sin¬ 
gle object reconstruction, by assembling object parts. Their 
method utilizes consistently segmented 3D shapes as the 
database. Given a scan of an object, it recursively search 
parts in the database to assemble the original object. The 
retrieval phase considers both the geometric similarity be¬ 
tween the input and the retrieved parts and the part compati¬ 
bility learned from the input shapes. 

Data-driven SLAM. Non-parametric methods have also 
found applications in reconstructing temporal geometric 
data (e.g., the output of the Kinect scanner). A notable tech¬ 
nique is simultaneous localization and mapping (or SLAM) 
method, which jointly estimates the trajectory of the scan¬ 
ning device and the geometry of the environment. In this 
case, shape collections serve us priors for the objects in 
the environment, which could be used to train object detec¬ 
tors. For example, the SLAM-i-i- system proposed by Salas- 
Moreno et al. [SMNS* 13] trained domain specific object de¬ 
tectors from shape collections. The learned detectors are in¬ 
tegrated inside the SLAM framework to recognize and track 
those objects. Similarly Kim et al. [KMYG12] use learned 
object models to reconstruct dense 3D models from a single 
scan of an indoor scene. More recently. Sun et al. [SX14] 
introduce 3D sliding window object detector with improved 
performance and broader range of objects. 

Shape-driven reconstruction from images. Recently, there 
is a growing interest in reconstructing 3D objects directly 
from images (e.g., [XZZ*11,KSES14,AME*14,SHM*14]). 
This problem introduces fundamental challenges in both 
querying similar objects and deforming objects/parts to fit 
the input object. In terms of searching similar objects, suc¬ 
cessful methods typically render objects in the database from 
a dense of viewpoints and pick objects, where one view is 
similar to the input image object. Since the depth informa¬ 
tion is missing from the image object, it is important to prop¬ 
erly regularize 3D object transformations. Since otherwise a 
3D object maybe deformed arbitrarily even though its pro¬ 
jection on the image domain matches the image object. Most 
existing techniques consider rigid transformations or user- 
specified deformations [XZZ*11]. In a recent work, Su et 
al. [SHM* 14] propose to learn meaningful deformations of 
each shape from its optimal deformations to similar shapes. 

7. Data-driven Shape Modeling and Synthesis 

So far, the creation of detailed three-dimensional content re¬ 
mains a tedious task confined with skilled artists. 3D content 
creation has been a major bottleneck hindering the develop¬ 
ment of ubiquitous 3D graphics. Thus, providing easy-to-use 
tools for casual and novice users to design and create 3D 
models has been a key challenge in computer graphics. To 
address this challenge, current literature has been focused on 
two main directions, i.e., intelligent interfaces for interactive 
shape modeling and smart models for automated model syn¬ 
thesis. The former strives to endow modeling interfaces with 
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Figure 12: Given a library of models, a Bayesian net¬ 
work encoding semantic and geometric relationships among 
shape parts is learned [CKGKll] (top). The modeling pro¬ 
cess (bottom) performs probabilistic inference in the learned 
Bayesian network to generate ranked lists of category la¬ 
bels and components within each category, customized for 
the currently assembled model 

higher-level understanding of the structure and semantics of 
3D shapes, allowing the interface to reason around the in¬ 
complete shape being modeled. The latter direction focuses 
on developing data-driven models to synthesize new shapes 
automatically. The core problem is to learn generative shape 
models from a set of exemplars (e.g., probability distribu¬ 
tions, fitness functions, functional constraints etc) so that the 
synthesized shapes are plausible and novel. It can be seen 
that both of the two paradigms depend on data-driven mod¬ 
eling of shape structures and semantics. With the availability 
of large 3D shape collections, data-driven approach seems a 
promising breakthrough to the content creation bottleneck. 


7.1. Interactive Shape Modeling and Editing 

Interactive 3D modeling software (3DS Max, Maya, etc.) 
provide the artists with a big set of powerful tools for cre¬ 
ating and editing very detailed 3D models, which are, how¬ 
ever, often onerous to harness for non-professional users. Eor 
casual users, more intuitive modeling interfaces with certain 
intelligence are preferred. Below we discuss such methods 
for assembly-based modeling and guided shape editing. 

Data-driven part assembly. Early works on 3D modeling 
based on shape sets are primarily driven by the purpose of 
content reuse in part-assembly based modeling approaches. 
The seminal work of modeling by example [EKS*04] 
presents a pioneering system of shape modeling by search¬ 
ing a shape database for parts to reuse in the construction 
of new shapes. Kraevoy et al. [KJS07] describe a system for 
shape creation via interchanging parts between a small set of 
compatible shapes. Guo et al. [GLXJ14] propose assembly- 
based creature modeling guided by a shape grammar. 

Beyond content reuse through database queries or hand¬ 
crafted rules, Chaudhuri and Koltun [CKlOa] propose a 
data-driven technique for suggesting the modeler with shape 
parts that can potentially augment the current shape being 
built. Such part suggestions are generated through retriev- 
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ing a shape database based on partial shape matching. Al¬ 
though this is a purely geometric method without account¬ 
ing for the semantics about shape parts, it represents the first 
attempt on utilizing shape database to augment the modeling 
interface. Later, Chaudhuri et al. [CKGKll] show that the 
incorporation of semantic relationships increases the rele¬ 
vance of presented parts. Given a repository of 3D shapes, 
the method learns a probabilistic graphical model encod¬ 
ing semantic and geometric relationships among shape parts. 
During modeling, inference in the learned Bayesian network 
is performed to produce a relevance ranking of the parts. 
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Figure 13: Given a hundred training airplanes (in green), 
the probabilistic model from [KCKK12] synthesizes several 
hundreds of new airplanes (in blue). 


A common limitation of the above techniques is that they 
do not provide a way to directly express a high-level design 
goal (e.g. “create a cute toy”). Chaudhuri et al. [CKG*13] 
proposed a method that learns semantic attributes for shape 
parts that reflect the high-level intent people may have for 
creating content in a domain (e.g. adjectives such as “danger¬ 
ous”, “scary” or “strong”) and ranks them according to the 
strength of each learned attribute (Figure 5). During an inter¬ 
active session, the user explores and modifies the strengths 
of semantic attributes to generate new part assemblies. 

3D shape collections can supply other useful informa¬ 
tion, such as contextual and spatial relationships between 
shape parts, to enhance a variety of modeling interfaces. 
Xie et al. [XXM*13] propose a data-driven sketch-based 
3D modeling system. In the off-line learning stage, a shape 
database is pre-analyzed to extract the contextual informa¬ 
tion among parts. During the online stage, the user designs a 
3D model by progressively sketching its parts and retrieving 
and assembling shape parts from the database. Both the re¬ 
trieval and assembly are assisted by the precomputed contex¬ 
tual information so that more relevant parts can be returned 
and selected parts can be automatically placed. Inspired by 
the ShadowDraw system [LZCll], Fan et al. [FWX*13] 
propose 3D modeling by drawing with data-driven shadow 
guidance. The user’s strokes are used to query a 3D shape 
database for generating the shadow image, which in turn can 
guide the user’s drawing. Along the drawing, 3D candidate 
parts are retrieved for assembly-based modeling. 

Data-driven editing and variation. The general idea of 
data-driven shape editing is to learn from a collection of 
closely related shapes a model that characterize the plausible 
variation or deformation of the shapes, and use the learned 
model to constrain the user’s edit to maintain plausibility. 
For organic shapes, such as human faces [BV99,CWZ*14] 
or bodies [ACP03], parametric models can be learned from 
a shape set characterizing its shape space. Such parametric 
models can be used to edit the shapes through exploring the 
shape space with the set of parameters. 

An alternative approach is the analyze-and-edit paradigm 
that is widely adopted to first extract the structure from the 
input shape and then try to preserve the structure through 
constraining the editing [GSMCO09]. Instead of learning 
structure from a single shape, which usually relies on prior- 
knowledge, Fish et al. [FAvK*14] learn it from a set of 
shapes belong to the same family, resulting in a set of ge¬ 
ometric distributions characterizing the part arrangements. 


These distributions can be used to guide structure-preserving 
editing, where models can be edited while maintaining their 
familial traits. Yumer et al. [YK14] extract co-constrained 
handles from a set of shapes for shape deformation. The han¬ 
dles are generated based on co-abstraction [YK12] of the set 
of shapes and the deformation co-constraints are learned sta¬ 
tistically from the set. 

Based on learned structure from a database of 3D mod¬ 
els, Xu et al. [XZZ*11] propose photo-inspired 3D object 
modeling. Guided by the object in a photograph, the method 
creates a 3D model as a geometric variation of a candi¬ 
date model retrieved from the database. Due to the pre¬ 
analyzed structural information, the method addresses the 
ill-posed problem of 3D modeling from a single 2D im¬ 
age via structure-preserving 3D warping. The final result is 
structurally plausible and is readily usable for subsequent 
editing. Moreover, the resulting 3D model, although built 
from a single view, is structurally coherent from all views. 

7.2. Automated Synthesis of Shapes 

Many applications such as 3D games and films require large 
collections of 3D shapes for populating their environments. 
Modeling each shape individually can be tedious even with 
the best interactive tools. The goal of data-driven shape syn¬ 
thesis algorithms is to generate several shapes automatically 
with no or very little user supervision: user may only provide 
some preferences or high-level specifications to control the 
shape synthesis. Existing methods achieve this task by us¬ 
ing probabilistic generative models of shapes, evolutionary 
methods, or learned probabilistic grammars. 

Statistical models of shapes. The basic idea of these meth¬ 
ods is to define a parametric shape space and then fit a prob¬ 
ability distribution to the data points that represent the in¬ 
put exemplar shapes. Since the input shapes are assumed to 
be plausible and desired representatives of the shape space, 
high-probability areas of the shape space with tend to be¬ 
come associated with new, plausible shape variants. This 
idea was first explored in the context of parametric mod¬ 
els [BV99, ACP03], discussed in Section 6. By associat¬ 
ing each principal component of the shape space defined 
by these methods with a Gaussian distribution, this distri¬ 
bution can be sampled to generate new human faces or bod¬ 
ies (Figure 10). Since the probability distribution of plausi¬ 
ble shapes tend to be highly non-uniform in several shape 
classes, Talton et al. [TGY*09] use kernel density estima- 
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tion with Gaussian kernels to represent plausible shape vari¬ 
ability. The method is demonstrated to generate new shapes 
based on tree and human body parametric spaces. 

Shapes have structure i.e., shapes vary in terms of their 
type and style, different shape styles have different num¬ 
ber and type of parts, parts have various sub-parts that can 
be made of patches, and so on. Thus, to generate shapes 
in complex domains, it is important to define shape spaces 
over structural and geometric parameters, and capture hi¬ 
erarchical relationships between these parameters at differ¬ 
ent levels. Kalogerakis et al. [KCKK12] (Figure 13) pro¬ 
posed a probabilistic model that represents variation and re¬ 
lationships of geometric descriptors and adjacency features 
for different part styles, as well as variation and relation¬ 
ships of part styles and repetitions for different shape styles. 
The method learns the model from a set of consistently seg¬ 
mented shapes. Part and shape styles are discovered based on 
latent variables that capture the underlying modes of shape 
variability. Instead of sampling, the method uses a search 
procedure to assemble new shapes from parts of the in¬ 
put shapes according to the learned probability distribution. 
Users can also set preferences for generating shapes from a 
particular shape style, with given part styles or specific parts. 

Set evolution. Xu et al. [XZCOC12] developed a method 
for generating shapes inspired by the theory of evolution in 
biology. The basic idea of set evolution is to define cross¬ 
over and mutation operators on shapes to perform part warp¬ 
ing and part replacement. Starting from an initial generation 
of shapes with part correspondences and built-in structural 
information such as inter-part symmetries, these operators 
are applied to create a new generation of shapes. A selected 
subset from the generation is presented via a gallery to the 
user who provides feedback to the system by rating them. 
The ratings are used to define the fitness function for the 
evolution. Through the evolution, the set is personalized and 
populated with shapes that better fit to the user. At the same 
time, the system explicitly maintains the diversity of the pop¬ 
ulation so as to prevent it from converging into an “elite” set. 



Figure 14: Scene comparisons may yield different similarity 
distances (left) depending on the focal points [XMZ* 14]. 


Growing numbers of 3D scenes in digital repositories pro¬ 
vide new opportunities for data-driven scene analysis, edit¬ 
ing, and synthesis. Emerging collections of 3D scenes pose 
novel research challenges that cannot be easily addressed 
with existing tools. In particular, representations created for 
analyzing collections of single models mostly focus on ar¬ 
rangement and relations between shape parts [MWZ*14], 
which usually exhibit less variations than objects in scenes. 
Capturing scene structure poses a greater challenge due to 
looser spatial relations and a more diverse mixture of func¬ 
tional substructures. 

Inferring scene semantics is a long-standing problem in 
image understanding, with many methods developed for 
object recognition [QT09], classification [SWIO], inferring 
spatial layout [CCPS13], and other 3D information [FGH13] 
from a single image. Previous work demonstrates that one 
can leverage collections of 3D models to facilitate scene 
understanding in images [SLH12]. In addition, the RGBD 
scans that include depth information can be used as training 
data for establishing the link between 2D and 3D for model- 
driven scene understanding [SKHF12]. Unfortunately, se¬ 
mantic annotations of images are not immediately useful for 
modeling and synthesizing 3D scenes, where priors have to 
be learned from 3D data. 

In this section, we cover data-driven techniques that lever¬ 
age collections of 3D scenes for modeling, editing, and syn¬ 
thesizing novel scenes. 


Learned Shape Grammars. Talton et al. [TYK*12] lever¬ 
age techniques from natural language processing to learn 
probabilistic generative grammars of shapes. The method 
takes as input a set of exemplar shapes represented with a 
scene graph specifying parent/child relationships and rel¬ 
ative transformations between labeled shape components. 
They use Bayesian inference to learn a probabilistic formal 
grammar that can be used to synthesize novel shapes. 

8. Data-driven Scene Analysis and Synthesis 

Analyzing and modeling indoor and outdoor environments 
has important applications in various domains. For example, 
in robotics it is essential for an autonomous agent to under¬ 
stand semantics of 3D environments to be able to interact 
with them. In urban planning and architecture, professionals 
build digital models of cities and buildings to validate and 
improve their designs. In computer graphics, artists create 
novel 3D scenes for movies and video games. 


Context-based retrieval. To address large variance in ar¬ 
rangements and geometries of objects in scenes, Fisher et 
al. [FH10,FSH1 1] suggest to take advantage of local context. 
One of the key insights of their work is that collections of 
3D scenes provide rich information about context in which 
objects appear. They show that capturing these contextual 
priors can help in scene retrieval and editing. 

Their system takes an annotated collection of 3D scenes 
as input, where each object in a scene is classified. They rep¬ 
resent each scene as a graph, where nodes represent objects 
and edges represent relations between objects, such as sup¬ 
port and surface contact. In order to compare scenes, they 
define kernel functions for pairs of nodes measuring similar¬ 
ity in object’s geometry, and for pairs of edges, measuring 
similarity in relations of two pairs of objects. They further 
define a graph kernel to compare pairs of scenes. In particu¬ 
lar, they compare all walks of fixed length originating at all 
pairs of objects in both scene graphs, which loosely captures 
similarities of all contexts in which objects appear [FSHll]. 
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Figure 15: The algorithm processes raw scene graphs with 
possible over-segmentation (a) into consistent hierarchies 
capturing semantic and functional groups (b,c) [LCK* 14]. 


They show that this similarity metric can be used to retrieve 
scenes. By comparing only paths originated at a particular 
object, they can retrieve objects for interactive scene editing. 

Focal Points. Measuring similarity of complex hybrid 
scenes such as studios composed of bedroom, living room, 
and dining room poses a challenge to graph kernel tech¬ 
niques since they only measure global scene similarity. Thus, 
Xu et al. [XMZ* 14] advocate analyzing salient sub-scenes, 
which they call focal points, to compare hybrid scenes, 
i.e., scenes containing multiple salient sub-scenes. Figure 14 
shows an example of comparing complex scenes, where the 
middle scene is a hybrid one encompassing two semanti¬ 
cally salient sub-scenes, i.e., bed-nightstands and TV-table- 
sofa. The middle scene is closer to the left one when the bed 
and nightstands are focused on, and otherwise when the TV- 
table-sofa combo is the focal point. Therefore, scene com¬ 
parison may yield different similarity distances depending 
on the focal points. 

Formally, a focal point is defined as a representative sub¬ 
structure of a scene which can characterize a semantic scene 
category. That means the substructure should re-occur fre¬ 
quently only within that category. Therefore, focal point de¬ 
tection is naturally coupled with the identification of scene 
categories via scene clustering. This poses coupled problems 
of detecting focal points based on scene groups and grouping 
scenes based on focal points. These two problems are solved 
via interleaved optimization which alternates between focal 
point detection and focal-based scene clustering. The for¬ 
mer is achieved by mining frequent substructures and the 
latter uses subspace clustering, where scene distances are de¬ 
fined in a focal-centric manner. Inspired by work of Fisher et 
al. [FSHll] scene distances is computed using focal-centric 
graph kernels which are estimated from walks originating 
from representative focal points. 

The detected focal points can be used to organize the 
scene collection and to support efficient exploration of the 
collection (see Section 9). Focal-based scene similarity can 
be used for novel applications such as multi-query scene re¬ 
trieval where one may issue queries consisting of multiple 
semantically related scenes and wish to retrieve more scenes 
“of the same kind”. 

Synthesis. Given an annotated scene collection, one can 
also synthesize new scenes that have similar distribution 
of objects. The scene synthesis technique of Fisher et 
al. [FRS*12] learns two probabilistic models from the train¬ 
ing dataset: (1) object occurrence, indicating which objects 



Figure 16: The interaction bisector surface (in blue) of sev¬ 
eral two-object scenes [ZWK14 ]. 


should be placed in the scene, and (2) layout optimiza¬ 
tion, indicating where to place the objects. Next, it takes 
an example scene, and then synthesizes similar scenes us¬ 
ing the learned priors. It replaces or adds new objects us¬ 
ing context-based retrieval techniques, and then optimizes 
for object placement based on learned object-to-object spa¬ 
tial relations. Synthesizing example scenes might be a chal¬ 
lenging task, thus Xu et al. [XCF* 13] propose modeling 3D 
indoor scenes from 2D sketches, by leveraging a database of 
3D scenes. Their system jointly optimizes for sketch-guided 
co-retrieval and co-placement of all objects. 

Hierarchical scene annotation. All aforementioned appli¬ 
cations take an annotated collection of 3D scenes as an in¬ 
put. Unfortunately, most scenes in public repositories are 
not annotated and thus require additional manual label¬ 
ing [FRS*12]. Liu et al. [LCK*14] address the challenge of 
annotating novel scenes. The key observation of their work 
is that understanding hierarchical structure of a scene en¬ 
ables efficient encoding of functional scene substructures, 
which significantly simplifies detecting objects and repre¬ 
senting their relationships. Thus, they propose a supervised 
learning approach to estimate hierarchical structure for novel 
scenes. Given a collection of scene graphs with consistent 
hierarchies and labels, they train a probabilistic hierarchical 
grammar encoding the distributions of shapes, cardinalities, 
and spatial relationships between objects. Such grammar can 
then be used to parse new scenes: find segmentations, ob¬ 
ject labels, and hierarchical organization of objects consis¬ 
tent with the annotated collection (see Figure 15). 

Challenges and opportunities. The topic of 3D scene anal¬ 
ysis is quite new and there are many open problems and 
research opportunities. The first problem is to efficiently 
characterize spatial relationships between objects and ob¬ 
ject groups. Most existing methods work with bounding box 
representation which is efficient to process, but not suffi¬ 
ciently informative to characterize object-to-object relation¬ 
ships. For example, one cannot reliably determine the object 
enclosure relationship based on a bounding box. Recently, 
He et al. [ZWK14] propose to use biologically-inspired bi¬ 
sector surface to characterize the geometric interaction be¬ 
tween adjacent objects and index 3D scenes (Figure 16). 
Second, most existing techniques heavily rely on expert user 
supervision for scene understanding. Unfortunately, online 
repositories rarely have models with reliable object tags. 
Therefore there is a need for methods that could leverage 
scenes with partial and noisy annotations. Finally, the pop¬ 
ularity of commodity RGBD cameras has significantly sim¬ 
plified the acquisition of indoor scenes. This emerging scan¬ 
ning technique opens space for new applications such as on- 
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line scene analysis with high fidelity scanning and recon¬ 
struction. Availability of image data that come with RGBD 
scans also enables enhancing geometric representations with 
appearance information. 

9. Exploration and Organization 

The rapidly growing number and diversity of digital 3D 
models in large online collections (e.g., TurboSquid, Trim¬ 
ble 3D Warehouse, etc.) have caused an emerging need to 
develop algorithms and techniques that effectively organize 
these large collections and allow users to interactively ex¬ 
plore them. For example, an architect can furnish a digital 
building by searching in databases organized according to 
furniture types, regions of interest and design styles, or an 
industrial designer can explore shape variations among ex¬ 
isting products, when creating a new object. Most existing 
repositories only support text-based search, relying on user- 
entered tags and titles. This approach suffers from inaccurate 
and ambiguous tags, often entered in different languages. 
While it is possible to try using shape analysis to infer con¬ 
sistent tags as discussed in Section 3, it is sometimes hard to 
convey stylistic and geometric variations using only text. An 
alternative approach is to perform shape-, sketch-, or image- 
based queries, however, to formulate such search queries the 
user needs to have a clear mental model of the shape that 
should be retrieved. Thus, some researchers focus on pro¬ 
viding tools for exploring shape collections. Unlike search, 
exploration techniques do not assume a-priori knowledge of 
the repository content, and help the user to understand geo¬ 
metric, topological, and semantic variations within the col¬ 
lection. 

Problem statement and method categorization. Data ex¬ 
ploration and organization is a classical problem in data 
analysis and visualization [PEP* * 11]. Given a data collec¬ 
tion, the research focuses on grouping and relating data 
points, learning the data variations in the collection, and 
organizing the collection into a structured form, to facilitate 
retrieval, browsing, summarization, and visualization of the 
data, based on some efficient interfaces or metaphor. 

The first step to organizing model collections is to devise 
appropriate metrics to relate different data points. Various 
similarity metrics have been proposed in the past to relate 
entire shapes as well as local regions on shapes. In partic¬ 
ular, previous sections of this document cover algorithms 
for computing global shape similarities (Section 3), part- 
wise correspondences (Section 4), and point-wise correspon¬ 
dences (Section 5). In this section, we will focus on tech¬ 
niques that take advantage of these correlations to provide 
different interfaces for exploring and understanding geomet¬ 
ric variability in collections of 3D shapes. We categorize the 
existing exploration approaches based on four aspects: 

• Metaphor: a user interface for exploring shape variations. 

We will discuss five basic exploration interfaces, ones that 

use proxy shapes (templates), regions of interest, proba¬ 
bility plots, query shapes, or continuous attributes. 

• Shape comparison: techniques used to relate different 


Method 

Meta. 

Comp. 

Var. 

Org. 

[OLGMll] 

temp. 

simi. 

geom. 

n/a 

[KLNPIS] 

temp. 

part 

both 

cluster 

[AKZM14] 

temp. 

part 

both 

cluster 

[KLNP12] 

ROI 

point 

both 

n/a 

[ROA*13] 

ROI 

point 

geom. 

n/a 

[HWG14] 

ROI 

point 

both 

cluster 

[XMZ*14] 

ROI 

simi. 

topo. 

cluster 

[FAvK* 14] 

plot 

part 

geom. 

cluster 

[HSS*13] 

query 

simi. 

both 

hierarchy 


Table 3: A summary of several recent works over four as¬ 
pects. Metap hor: templates, surface painted ROI s, probabil¬ 
ity distribution plots, or query shapes. Shape Comparison: 
shape simi larity, part or point correspondence. Var iability: 
geometry, topology or both . Organization Form: cluster or 
hierarchy. 


shapes. We will discuss techniques that use global shape 
similarities, and part or point correspondences. 

• Variability: shape variations captured by the system. 
Most methods we will discuss rely on geometric variabil¬ 
ity of shapes or parts. Some techniques also take advan¬ 
tage of topological variability, that is variance in number 
of parts or how they are connected (or variance in num¬ 
bers of objects and their arrangements in scenes). 

• Organization form: a method to group shapes. We will 
discuss methods that group similar shapes to facilitate ex¬ 
ploring intra-group similarities and inter-group variations, 
typically including clustering and hierarchical clustering. 

Table 3 summarizes several representative works in terms of 
these aspects. In the remaining part of this section we list 
several recent techniques grouped based on the exploration 
metaphor. 

Template-based exploration. Component-wise variability 
in positions and scales of parts reveals useful information 
about a model collection. Several techniques use box-like 
templates to show variations among models of the same 
class. Ovsjanikov et al. [OLGMll] describe a technique for 
learning these part-wise variations without solving the chal¬ 
lenging problem of consistent segmentation. First, they use 
a segmentation of a single shape to construct the initial tem¬ 
plate. This is the only step that needs to be verified and po¬ 
tentially fixed by the user. The next goal is to automatically 
infer deformations of the template that would capture the 
most important geometric variations of the models the col¬ 
lection. They hypothesize that all shapes can be projected 
on a low-dimensional manifold based on their global shape 
descriptors. Finally, they reveal the manifold structure by de¬ 
forming a template to fit to the sample points. Directions for 
interesting variations are depicted by arrows on the template 
and the shapes that correspond to current template configu¬ 
ration are presented to the user. 

Descriptor-based approach described above assumes that 
all shapes share same parts and there exists a low¬ 
dimensional manifold that can be captured by deforming a 
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Figure 17: Shape exploration based on fuzzy correspon¬ 
dence. The user paints a region of interest (ROI) on a query 
shape (left column), and the method sorts models based on 
their similarity within the region (right). 

single template. These assumptions do not hold for large and 
diverse collections of 3D models. To tackle this challenge, 
Kim et al. [KLM*13] proposed an algorithm for learning 
several part-based templates capturing multi-modal variabil¬ 
ity in collections of shapes. They start with an initial tem¬ 
plate that includes a super-set of all parts that might occur in 
a dataset, and jointly learn part segmentations, point-to-point 
surface correspondence and a compact deformation model. 
The output is a set of templates that groups the input models 
into clusters capturing their styles and variations. 

ROI-based exploration. Not all interesting variations oc¬ 
cur at the scale of parts: they can occur at sub-part scale, 
or span multiple sub-regions from multiple parts. In these 
cases the user may prefer to select an arbitrary region on a 
3D model and look for more models sharing similar regions 
of interest. Such detailed and flexible queries require a finer 
understanding of correspondences between different shapes. 
Kim et al. [KLM* 12] propose fuzzy point correspondences 
to encode the inherent ambiguity in relating diverse shapes. 
Fuzzy point correspondences are represented by real values 
specified for all pair of points, indicating how well the points 
correspond. They leverage transitivity in correspondence re¬ 
lationships to compute this representation from a sparse set 
of pairwise point correspondences. The interface proposed 
by Kim et al. allows painting regions of interest directly on 
a surface, and the system retrieves similar regions or shows 
geometric variations in the selected region (see Figure 17). 

One limitation of correspondence-based techniques is that 
they typically do not consider the entire collection when es¬ 
timating shape differences. Rustamov et al. [ROA* 13] focus 
on a fundamental intrinsic representation for shape differ¬ 
ences. Starting with a functional map between two shapes, 
that is a map that describes change of functional basis, they 
derive a shape difference operator revealing detailed infor¬ 
mation about location, type, and magnitude of distortion in¬ 
duced by a map. This makes shape difference a quantifiable 
object that can be co-analyzed within a context of the en¬ 
tire collection. They show that this deeper understanding 
of shape differences can help in exploration. For example, 
one can embed shapes in a low-dimensional space based on 
shape differences, or use shape difference to interpolate vari¬ 
ations by showing “intermediate" shapes between two re- 



Figure 18: Focal-based scene clustering produces overlap¬ 
ping clusters, which is due to hybrid scenes possessing multi¬ 
ple focal points. An exploratory path, from (a) to (e), through 
the overlap, smoothly transit between the two scene clusters, 
representing bedroom and offices, respectively. 

gions of interest. To extend these technique to man-made 
objects, Huang et al. [HWG14] construct consistent func¬ 
tional basis for shape collections that exhibit large geometric 
and topological variability. They show that resulting consis¬ 
tent maps can capture discrete topological variability, such 
as variance in number of bars in the back of a chair. 

ROI-based scene exploration. Recent works on organizing 
and exploring 3D visual data mostly focus on object collec¬ 
tions. Exploring 3D scenes poses additional challenges since 
they typically exhibit more variance in structure. Unlike 
man-made objects that usually contain of a handful of ob¬ 
ject parts, scene usually includes tens to hundreds of objects, 
and most objects do not typically have a prescribed rigid ar¬ 
rangement. Thus, global scene similarity metrics, such as a 
graph kernel based technique by [FRS* 12] are limited to or¬ 
ganizing datasets based on very high-level features, such as 
scene type. Xu et al. [XMZ*14] advocate that 3D scenes 
should be compared from a perspective of a particular fo¬ 
cal point which is a representative substructure of a specific 
scene category. Focal points are detected through contextual 
analysis of a collection of scenes, resulting in a clustering of 
the scene collection where each cluster is characterized by 
its representative focal points (see Section 8). Consequently, 
the focal points extracted from a scene collection can be used 
to organize collection into an interlinked and well-connected 
cluster formation, which facilitates scene exploration. Fig¬ 
ure 18 shows an illustration of such cluster-based organiza¬ 
tion and an exploratory path transiting between two scene 
clusters/categories. 

Plot-based exploration. All aforementioned exploration 
techniques typically do not visualize the probabilistic na¬ 
ture of shape variations. Fish et al. [FAvK*14] study the 
configurations of shape parts from a probabilistic perspec¬ 
tive, trying to indicate which shape variations are more likely 
to occur. To learn the distributions of part arrangements, all 
shapes in the family are pre-segmented consistently. The re¬ 
sulting set of probabilistic density functions (PDF) charac¬ 
terize the variability of relations and arrangements across 
different parts. A peak in a PDF curve represents a configu¬ 
ration of the related parts frequently appeared among several 
shapes in the family. The multiple PDFs can be used as inter¬ 
faces to interactively explore the shape family from various 
perspectives. Averkiou et al. [AKZM14], use part structure 
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Figure 19: Given a set of heterogeneous shapes, a reliable 
qualitative similarity is derived from quartets composed of 
two pairs of objects (left). Aggregating such qualitative in¬ 
formation from many quartets computed across the whole set 
leads to a categorization tree as a hierarchical organization 
of the input shape collection (right). 


inferred by this method to produce a low-dimensional part- 
aware embedding of all models. The user can explore inter¬ 
esting variations in part arrangements simply by moving the 
mouse over the 2D embedding. In addition, their technique 
allowed to synthesize novel shapes by clicking on empty 
spaces in the embedded space. At click the system would 
deform parts from neighboring shapes to synthesize a novel 
part arrangement. 

Query-based exploration. For a heterogeneous shape col¬ 
lection encompassing diverse object classes, it is typically 
not possible to capture shape part structure and correspon¬ 
dence. Even global shape similarity is not a very reliable 
feature, which makes organizing and exploring heteroge¬ 
neous collections especially difficult. To address this chal¬ 
lenge, Huang et al. [HSS*13] introduce qualitative analysis 
from the bioinformatics field. Instead of relying on quanti¬ 
tative distances, which may be unreliable between dissim¬ 
ilar shapes, the method considers more reliable qualitative 
similarity derived from quartets composed of two pairs of 
objects. The shapes that are paired in the quartet are close 
to each other and far from the shapes in the other pair, 
where distances are estimated from multiple shape descrip¬ 
tors. They aggregate this topological information from many 
quartets computed across the entire shape collection, and 
construct a hierarchical categorization tree (see Figure 19). 
Analogous to the phylogenetic trees of species, the catego¬ 
rization tree of a shape collection provides an overview of 
the shapes about their mutual distance and hierarchical rela¬ 
tions. Based on such organization, they also define the de¬ 
gree of separation chart for every shape in the collection and 
apply it for interactive shapes exploration. 

Attribute-based exploration. An alternative approach is to 
allow users interactively explore shapes with continuously 
valued semantic attributes. Blanz and Vetter [BV99] provide 
an interface to explore faces based on continuous facial at¬ 
tributes, such as “smile” or “frown”, built upon the face para¬ 
metric model (Section 6). Similarly, Allen et al. [ACP03] al¬ 
low users explore the range of human bodies with features, 
such as height, weight, and age. Chaudhuri et al.’s [CKG* 13] 


interface enables exploration of shape parts according to 
learned strengths of semantic attributes (Figure 5). 


10. Conclusion 

In this survey, we discussed the state-of-the-art on data- 
driven methods for 3D shape analysis and processing. We 
also presented the main concepts and methodologies used to 
develop such methods. We hope that this survey will act as 
a tutorial that will help researchers develop new data-driven 
algorithms related to shape processing. There are several ex¬ 
citing research directions that have not been sufficiently ex¬ 
plored so far in our community that we discuss below: 

Joint analysis of 2D and 3D data. Generating 3D con¬ 
tent from images requires building mappings from 2D to 
3D space. The problem is largely ill-posed, however, with 
the help of the vast amount of 2D images available on the 
web, effective priors can be developed to map 2D visual el¬ 
ements or features to 3D shape and scene representations. 
Initial attempts to build alignments between 2D and 3D data 
are the recent works by Su et al [SHM*14] and Aubry et 
al. [AME* 14], which can further inspire more work on this 
topic. Another possibility is to jointly analyze shape and tex¬ 
ture data. The work of co-segmenting textured 3D shapes by 
Yumer et al. [YCM14] is one such example. Following this 
line, it would be interesting to jointly analyze and process 
multi-modal visual data, including depth scans and videos. 
The key challenges is how to integrate the heterogeneous in¬ 
formation in a unified learning framework. 

Better and scalable shape analysis techniques. Many 
data-driven applications rely on high-quality shape analysis 
results, particularly in segmentations and correspondences. 
We believe it is important to further advance the research 
in both directions. This includes designing shape analysis 
techniques for specific data and/or making them scalable to 
gigantic datasets. 

From geometry to semantics and vice versa. Several data- 
driven methods have tried to map 2D and 3D geometric data 
to high-level concepts, such as shape categories, semantic 
attributes, or part labels. Existing methods deal with cases 
where only a handful of different entities are predicted for in¬ 
put shapes or scenes. Scaling these methods to handle thou¬ 
sands or more categories, part labels and other such enti¬ 
ties, as well as approaching human performance is an open 
problem. The opposite direction is also interesting and insuf¬ 
ficiently explored: generating or editing shapes and scenes 
based on high-level specifications, such as shape styles, at¬ 
tributes, or even natural language, potentially combined with 
other input, such as sketches and interactive handles. Word- 
sEye [CSOl] was an early attempt to bridge this gap, yet 
requires largely manual mappings. The more recent work 
by [CKG* 13] handles only shape part replacements driven 
by linguistic attributes. 

Understanding function from geometry. The geometry 
of a shape is strongly related to its functionality including 
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its relationship to human activity. Thus, analyzing shapes 
and scenes requires some understanding of their function. 
The recent work by Laga et al. [LMS13] and Kim et 
al. [KCGF14] are important examples of data-driven ap¬ 
proaches that take into account functional aspects in shape 
analysis. In addition, data-driven methods can guide the syn¬ 
thesis of shapes that can be manufactured or 3D printed 
based on given functional specifications; an example of such 
attempt is the work by Schulz et al [SSL* 14]. 

Data-driven shape abstractions. It is relatively easy for 
humans to communicate the essence of shapes with a few 
lines, sketches, and abstract forms. Developing methods 
that can build such abstractions automatically has signifi¬ 
cant applications in shape and scene visualization, artistic 
rendering, and shape analysis. There are a few data-driven 
approaches to line drawing [CGL*08, KNS*09, KNBH12], 
saliency analysis [CSPF12], surface abstraction [YK12], and 
viewpoint preferences [SLF*11] related to this goal. Match¬ 
ing the human performance in these tasks is still a largely 
open question, while synthesizing and editing shapes using 
shape abstractions as input remains a significant challenge. 

Feature learning. Several shape and scene processing tasks 
depend on designing geometric descriptors for points and 
shapes, as we show in Section 3. In general, it seems that 
some descriptors work well in some specific classes, but fail 
in several others. A main issue is that there are no geometric 
features that can serve as reliable mid- or high-level repre¬ 
sentations of shapes. Recent work in computer vision shows 
that features can be learned from data in the case of 2D and 
3D images [YN10,LBF13], thus a promising direction is to 
extend this work for learning feature representations from 
raw 3D geometric data. 
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Work 

Training data 

Feature 

Learning model/approach 

Learning type 

Learning outcome 

Application 

Rep. 

Preproc. 

Scale 

type 

Sel. 

[FKMS05] 

Point 

No 

Thousands 

Local 

No 

SVM classifier 

Supervised 

Object classifier 

Classification 

[BBOGll] 

Mesh 

No 

Thousands 

Local 

No 

Similarity Sensitive Hashing 

Supervised 

Distance metric 

Classification 

[HSG13] 

Mesh 

Pre-align. 

Thousands 

Local 

No 

Max-marginal distance learning 

Semi-supervised 

Distance metric 

Classification 

[KHSIO] 

Mesh 

No 

Tens 

Local 

Yes 

Jointboost classifier 

Supervised 

Face classifier 

Segmentation 

[vKTS^ll] 

Mesh 

Yes 

Tens 

Local 

Yes 

Gentleboost classifier 

Supervised 

Face classifier 

Segmentation 

[BLVDll] 

Mesh 

No 

Tens 

L.&G. 

Yes 

Adaboost classifier 

Supervised 

Boundary classifier 

Segmentation 

[XXLX14] 

Mesh 

No 

Hundreds 

Local 

Yes 

Feedforward neural networks 

Supervised 

Face/patch classifier 

Segmentation 

[XSX“^14] 

Mesh 

Pre-seg. 

Tens 

Local 

No 

Sparse model selection 

Supervised 

Segment similarity 

Segmentation 

[LCHB12] 

Mesh 

No 

Tens 

Local 

Yes 

Entropy regularization 

Semi-supervised 

Face classifier 

Segmentation 

[WAvK"12] 

Mesh 

Pre-seg. 

Hundreds 

Local 

No 

Active learning 

Semi-supervised 

Segment classifier 

Segmentation 

[WGW^ 13] 

Image 

Labeled parts 

Hundreds 

Local 

No 

2D shape matching 

Supervised 

2D shape similarity 

Segmentation 

[HFL12] 

Mesh 

Over-seg. 

Tens 

Local 

Yes 

Subspace clustering 

Unsupervised 

Patch similarity 

Seg. / Corr. 

[SvKK^^ll] 

Mesh 

Pre-seg. 

Tens 

Local 

No 

Spectral clustering 

Unsupervised 

Seg. simi./classifier 

Seg. / Corr. 

[XLZ^ 10] 

Mesh 

Part 

Tens 

Struct. 

No 

Spectral clustering 

Unsupervised 

Part proportion simi. 

Seg. / Corr. 

[vKXZ"13] 

Mesh 

Part 

Tens 

Struct. 

No 

Multi-instance clustering 

Unsupervised 

Seg. hier. simi. 

Seg. / Corr. 

[GF09] 

Mesh 

No 

Tens 

Global 

No 

Global shape alignment 

Unsupervised 

Face similarity 

Seg. / Corr. 

[HKGll] 

Mesh 

Pre-seg. 

Tens 

Local 

No 

Joint part matching 

Unsupervised 

Segment similarity 

Seg. / Corr. 

[HWG14] 

Mesh 

Init. corn 

Tens 

Global 

No 

Consistent func. map networks 

Unsupervised 

Segment similarity 

Seg. / Corr. 

[KLM^ 13] 

Mesh 

Template 

Thousands 

Local 

No 

Shape alignment 

Semi-supervised 

Templates 

Seg. / Corr. 

[MPM'^M] 

Mesh 

Over-seg. 

Hundreds 

Local 

No 

Density-based clustering 

Unsupervised 

Patch similarity 

Recognition 

[NBCW^^ll] 

Mesh 

Init. corr. 

Tens 

L.&G. 

No 

Inconsistent map detection 

Unsupervised 

Point similarity 

Corr. / Expl. 

[HZG^12] 

Mesh 

Init. corr. 

Tens 

L.&G. 

No 

MRF joint matching 

Unsupervised 

Point similarity 

Corr. / Expl. 

[KLM^ 12] 

Mesh 

Pre-align. 

Tens 

Global 

No 

Spectral matrix recovery 

Unsupervised 

Point similarity 

Corr. / Expl. 

[HG13] 

Mesh 

Init. corr. 

Tens 

Global 

No 

Low-rank matrix recovery 

Unsupervised 

Point similarity 

Corr. / Expl. 

[OLGMll] 

Mesh 

Part 

Hundreds 

Global 

No 

Manifold learning 

Unsupervised 

Parametric model 

Exploration 

[ROA^ 13] 

Mesh 

Map 

Tens 

None 

N/A 

Functional map analysis 

Unsupervised 

Difference operator 

Exploration 

[FAvK^ 14] 

Mesh 

Labeled parts 

Hundreds 

Struct. 

No 

Kernel Density Estimation 

Supervised 

Prob. distributions 

Expl. / Synth. 

[AKZM14] 

Mesh 

[KLM" 13] 

Thousands 

Struct. 

No 

Manifold learning 

Unsupervised 

Parametric models 

Expl. / Synth. 

[HSS"13] 

Mesh 

No 

Hundreds 

Global 

No 

Quartet analysis and clustering 

Unsupervised 

Distance measure 

Organization 

[BV99] 

Mesh 

Pre-align. 

Hundreds 

Local 

No 

Principal Component Analysis 

Unsupervised 

Parametric model 

Recon. / Expl. 

[ACP03] 

Point 

Pre-align. 

Hundreds 

Local 

No 

Principal Component Analysis 

Unsupervised 

Parametric model 

Recon. / Expl. 

[HSS"09] 

Point 

Pre-align. 

Hundreds 

Local 

No 

PCA & linear regression 

Unsupervised 

Parametric model 

Recon. / Expl. 

[PMG^05] 

Mesh 

Pre-align. 

Hundreds 

Global 

No 

Global shape alignment 

Unsupervised 

Shape similarity 

Reconstruction 

[NXS12] 

Point 

Labeled parts 

Hundreds 

Struct. 

No 

Random Forest Classifier 

Supervised 

Object classifier 

Reconstruction 

[SFCH12] 

Mesh 

Labeled parts 

Tens 

Global 

No 

Part matching 

Unsupervised 

Part detector 

Reconstruction 

[KMYG12] 

Point 

Labeled parts 

Tens 

Local 

No 

Joint part fitting and matching 

Unsupervised 

Object detector 

Reconstruction 

[SMNS"13] 

Mesh 

No 

Tens 

L.&G. 

No 

Shape matching 

Unsupervised 

Object detector 

Reconstruction 

[XZZ^ll] 

Mesh 

Labeled parts 

Tens 

Struct. 

No 

Structural shape matching 

Unsupervised 

Part detector 

Modeling 

[AME^ 14] 

Mesh 

Projected 

Thousands 

Visual 

No 

Linear Discriminant Analysis 

Supervised 

Object detector 

Recognition 

[SHM"14] 

Mesh 

Projected 

Tens 

Visual 

No 

Shape matching 

Unsupervised 

2D-3D correlation 

Reconstruction 

[CKlOb] 

Mesh 

No 

Thousands 

Global 

No 

Shape matching 

Unsupervised 

Part detector 

Modeling 

[CKGKll] 

Mesh 

[KHSIO] 

Hundreds 

Local 

No 

Bayesian Network 

Unsupervised 

Part reasoning model 

Modeling 

[XXM"'13] 

Mesh 

Labeled parts 

Tens 

Struct. 

No 

Contextual part matching 

Unsupervised 

Part detector 

Modeling 

[KCKK12] 

Mesh 

[KHSIO] 

Hundreds 

L.&G. 

No 

Bayesian Network 

Unsupervised 

Shape reasoning model 

Synthesis 

[XZCOC12] 

Mesh 

Part 

Tens 

Struct. 

No 

Part matching 

Unsupervised 

Part similarity 

Synthesis 

[TYK^12] 

Mesh 

Labeled parts 

Tens 

Struct. 

No 

Structured concept learning 

Unsupervised 

Probabilistic grammar 

Synthesis 

[YK12] 

Mesh 

No 

Tens 

Global 

No 

Shape matching 

Unsupervised 

Shape abs. similarity 

Modeling 

[YK14] 

Mesh 

Pre-seg. 

Tens 

Local 

No 

Segment matching 

Unsupervised 

Segment abs. simi. 

Modeling 

[CKG"13] 

Mesh 

[KHSIO] 

Hundreds 

L.&G. 

No 

SVM ranking 

Supervised 

Ranking metric 

Model. / Expl. 

[FSHll] 

Scene 

Labeled obj. 

Tens 

Struct. 

No 

Relevance feedback 

Supervised 

Contextual obj. simi. 

Classification 

[FRS^12] 

Scene 

Labeled obj. 

Hundreds 

Struct. 

No 

Bayesian Network 

Supervised 

Mixture models 

Synthesis 

[XCF" 13] 

Scene 

Labeled obj. 

Hundreds 

Struct. 

No 

Frequent subgraph mining 

Unsupervised 

Frequent obj. groups 

Modeling 

[XMZ”^ 14] 

Scene 

Labeled obj. 

Hundreds 

Struct. 

No 

Weighted subgraph mining 

Unsupervised 

Distinct obj. groups 

Org. / Expl. 

[LCK" 14] 

Scene 

Labeled hier. 

Tens 

Struct. 

No 

Probabilistic learning 

Supervised 

Probabilistic grammar 

Seg. / Corr. 


Table 4: Comparison of various works on data-driven shape analysis and processing. For each work, we summarize over the 
criterion set defined for data-driven methods: training data (encompassing data representation, preprocessing and scale), fea¬ 
ture (including feature type and whether feature selection is involved), learning model or approach, learning type (supervised, 
semi-supervised, and unsupervised), learning outcome (e.g., a classifier or a distance metric), as well as its typical applica¬ 
tion scenario. See the text for detailed explanation of the criteria. Some works employ another work as a pre-processing stage 
(e.g., [CKG* 13] requires the labeled segmentation produced by [KHSIO]). There are four types of features including local ge¬ 
ometric features (Local), global shape descriptors (Global), both local and global shape features (L.&G.), structural features 
(Struct.) as well as 2D visual features (Visual). 
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